Cygwin builds of CMake aren't able to use get_filename_component() to read from the registry. If we are using a Cygwin build of CMake, instead use regtool which is provided by cygwin.
There are numerous reasons why this change is warranted which are described below:
- We do not have access to z9 machines to test the project on so the current unofficial support is untested - z/OS 2.2 is the minimum supported OMR OS version which has an ALS of z10, so it is not an issue to drop z9 support from the perspective of z/OS - SLES 12, RHEL 7.4, and Ubuntu 16.04 are the minimum supported OS versions for Linux, and according to [3] z9 is not supported on any of the OS version specified - In addition to Centos 7.4 which is based off of RHEL 7.4, meaning there is no support for z9 systems either - z9 has been out of service since January 2019 [2] - Updating the `-march` build flags will allow the compiler to build the project using z10 instructions which can allow for better C/C++ code to be generated which improves performance (the big one being that we can use PC relative instructions by default) - Allows for better code maintenance and reduces complexity of the JIT compiler, particularly in the area of supporting non-relative based instruction sequences for z9 - Field research from z/OS OMs and Linux on Z OMs suggest that there is a very small footprint of users who still have z9 systems, and of those the % using the latest version of this project will be even smaller, if not zero - Officially, no documentation needs to be updated, since we do not claim support for z9 publicly so the transition can be transparent
Switch the Power minimumALS to use OMRProcessorArchitecture
For future use, the Power codegen's instruction properties table contains a field called minimumALS that describes the first processor architecture which introduced a given instruction. Previously, this used the TR_Processor enum, which has since been deprecated in favour of using the OMRProcessorArchitecture enum from the port library. To keep this field consistent, it has also been changed to use OMRProcessorArchitecture.
Signed-off-by: Benjamin Thomas <ben@benthomas.ca> (commit: 958b49d)
During a previous change, adding the warning flags was moved into omr_add_executable and omr_add_library. Unfortunately, the OMR compiler is not yet ready to be compiled with warnings and this causes the compiler tests to spam lots of warnings. To correct this, these functions now accept an extra "NOWARNINGS" option that disables this behaviour, which the OMR compiler uses for its tests and examples.
Fixes: #5296 Signed-off-by: Benjamin Thomas <ben@benthomas.ca> (commit: 0f7c8c3)
AArch64: Fix order of calls to maintain the proper live state of registers
This commit fixes evaluators where `TR::CodeGenerator::decReferenceCount()` is issued before a `TR::Node::setRegister()`, which kills the register too early.
Fixes in RegDepCopyRemoval for postGRA block splitter
RegDepCopyRemoval is the last ran optimization which intends to reduce register shuffling due to global register dependencies. It was generating tree where we have a PassThrough Child under PassThrough to copy node to new virtual register. This tree was not working well with the post GRA block splitter. Issue was that PassThrough child encountered in GlRegDeps was anchored to tree top. In postGRA block splitter, it records the regStores encountered before the GlRegDeps in extended basic block to guide decision on which register to chose for the node after split point.In this commit, adding changes to both splitPostGRA block splitter and RegDepCopyRemoval optimization so that we can split the blocks after later has run.
Signed-off-by: Rahil Shah <rahil@ca.ibm.com> (commit: 104d2c3)
It is possible to have a PassThrough as a child node of regular node. When we are validating the children of a node, we need to make sure that when we see a PassThrough child, it gets real child that represents the node and check its type with parent.
Signed-off-by: Rahil Shah <rahil@ca.ibm.com> (commit: e2bb845)
Fix x gprClobberEvaluate to ensure correct gc flags are set on new register
The method gprClobberEvaluate may allocate a new register. In this case, the new register may need to be flagged as a collected reference or an internal pointer. This commit ensures that these two flags are propagated from the old register.
Fix x iselectEvaluator to ensure correct gc flags are set on result register
- Use `TR::Register::containsCollectedReference()` instead of `TR::Node::isNotCollected()` to check if the false register contains a collected reference. `isNotCollected` should not be used after lower trees as it does not handle nodes that convert between integral and address types.
- Add assert which will fail if one of the children of the select node is an internal pointer. Internal pointers cannot be handled since we cannot set the pinning array on the result register without knowing which side of the select will be taken.
Fix z selectEvaluator to ensure correct gc flags are set on result register
- Use `TR::Register::containsCollectedReference()` instead of `TR::Node::isNotCollected()` to check if the false register contains a collected reference. `isNotCollected` should not be used after lower trees as it does not handle nodes that convert between integral and address types.
- Add assert which will fail if one of the children of the select node is an internal pointer. Internal pointers cannot be handled since we cannot set the pinning array on the result register without knowing which side of the select will be taken.
Fix aarch64 iselectEvaluator to ensure correct gc flags are set on result register
- Add code to flag the result register as a collected reference if the false register contains a collected reference.
- Add assert which will fail if one of the children of the select node is an internal pointer. Internal pointers cannot be handled since we cannot set the pinning array on the result register without knowing which side of the select will be taken.
Fix riscv gprClobberEvaluate to ensure correct gc flags are set on new register
The method gprClobberEvaluate may allocate a new register. In this case, the new register may need to be flagged as a collected reference or an internal pointer. This commit ensures that these two flags are propagated from the old register.
Fix riscv iselectEvaluator to ensure correct gc flags are set on result register
- Add code to flag the result register as a collected reference if the false register contains a collected reference.
- Add assert which will fail if one of the children of the select node is an internal pointer. Internal pointers cannot be handled since we cannot set the pinning array on the result register without knowing which side of the select will be taken.
The previous Tril tests for select opcodes were incomplete and did not test a number of corner cases. For instance, only select opcodes with a compare as their first child were being tested. These tests have been expanded to cover more cases.
Signed-off-by: Ben Thomas <ben@benthomas.ca> (commit: 58d2d97)
The Z codegen currently has a known bug with sub-integer compares as the first child of an integral select opcode. The tests for this have been temporarily disabled until this can be resolved.
Signed-off-by: Ben Thomas <ben@benthomas.ca> (commit: 0987b88)
The x86 codegen currently has a known bug with sub-integer compares as the first child of an integral select opcode. The tests for this have been temporarily disabled until this can be resolved.
Signed-off-by: Ben Thomas <ben@benthomas.ca> (commit: d1a2969)
Implement a helper to evaluate a node to a condition register
In certain evaluators (e.g. *select), a selector value needs to be used as a condition in ICF. Previously, compare nodes would need to be evaluated to a GPR, then the GPR would be compared against 0. This was obviously inefficient, since it's possible to evaluate most comparisons directly to a condition register in 1-2 instructions. Some evaluators had special handling for this, but this handling was not uniform and would frequently miss corner cases.
To address this and allow for better optimization opportunities, a new helper has been added which evaluates a node to a condition register and returns the condition which is true if the node was non-zero. For compare nodes, the comparison will be performed directly into a condition register. For other types of nodes, the old behaviour of evaluating the node to a GPR and comparing that against 0 will be used.
Signed-off-by: Ben Thomas <ben@benthomas.ca> (commit: 5ed32ed)
Remove the concept of flipped CCRs from the Power codegen
In order to avoid FXU rejects on POWER6 chips, the Power codegen contains logic to flip the order of operands when performing a comparison when determined to be necessary.
Unfortunately, the implementation of this optimization left much to be desired. The generateTrg1Src2Instruction helper would set a special flag on the target register and silently flip the compare operand order if it was deemed to be necessary. The generateConditionalBranch helper (and related helpers) would then check this flag to determine whether they needed to flip the branch. This is obviously very surprising and also doesn't correctly handle CR logical instructions.
Since the evaluation of comparisons to a condition register has now been centralized in evaluateIntCompareToConditionRegister, the logic can now be moved there, with the condition returned being flipped rather than using hacky flags. This should prevent any unfortunate mistakes in the future.
Signed-off-by: Ben Thomas <ben@benthomas.ca> (commit: 8ed69bc)
The Power evaluators for *select opcodes have been significantly refactored to be much simpler. Additionally a number of new optimizations have been added to reduce the overhead of these opcodes:
1. Selector nodes which are compares are now evaluated directly to a condition register to avoid the overhead of evaluating them and then comparing the result against 0. 2. The register for one of the operands is reused if possible to avoid an extra mr/fmr instruction.
Signed-off-by: Ben Thomas <ben@benthomas.ca> (commit: 2f1baf8)
The previous implementation of iselect node evaluation used a control flow instruction called iselect to implement its control flow. Since the new implementation uses ICF instead, this instruction is no longer needed.
Signed-off-by: Ben Thomas <ben@benthomas.ca> (commit: bcd8e57)
Implement isel exploitation in the select evaluators
Power has long had an instruction for branchlessly selecting between two GPRs based on a condition register bit. Unfortunately, performance of this instruction left much to be desired until POWER10. Now that POWER10 has improved performance of this instruction, exploitation of it can be added to the relevant select evaluators.
Signed-off-by: Ben Thomas <ben@benthomas.ca> (commit: 2b9283e)
Add internal pointer assert to Power select evaluator
The select evaluator in the Power codegen cannot handle select nodes with children that are internal pointers, since information about the pinning array pointer would be lost. An assert has been added to verify that internal pointers are not passed to this evaluator.
Signed-off-by: Ben Thomas <ben@benthomas.ca> (commit: bc037b1)
Use Snippet to load methodPointer in remote compile
Relocatable compiles such as remote compilations in openj9 don't allow the use of PC relative instructions like LARL. This commit prevents the use of such an instruction when tring to load a methodPointerConstant.
AArch64: Fix memory reference to kill registers properly
When the node of memory reference is `aconst`/`iconst`/`lconst`, a base register may not be killed in `decNodeReferenceCounts` because both `_baseRegister` and `_baseNode` are not NULL.
- Introduce `assocreg` pseudo instruction. - Keep track of the real register to virtual register mappings which are set up by register dependencies. - Emit `assocreg` instructions at the end of every extended basic block. - Emit `assocreg` instructions when setting up register dependencies. - In register assignment phase, update virtual register to real register mappings when we encounter an `assocreg` instruction.
After disableTOC is turned on in POWER10 and TOCBaseRegister is about to be recovered as a usual preserved register, we should get rid of any reference to TOCBaseRegister. Otherwise, we will crash in register assignment phase.
We set the baseRegister in the fake memRef to NULL instead, given the pTOC is full by definition due to disableTOC.
Set up gr12 for systemLinkage dispatch for LE Linux naturally through PC-relative constant snippet or direct materialization.
Signed-off-by: Julian Wang <zlwang@ca.ibm.com> (commit: b3cfdbd)
AArch64: Fix memory reference for indirect load of const node
This commit fixes a problem of memory reference when root node is indirect load and the child of that node is const. The problem was introduced by OMR #5520.
This commit makes following changes to register dependencies and instruction class. - Change `addDependency()` helper method to set place holder flag of virtual register if the virtual register is allocated in the method. - Add `useRegister()` method to `OMR::ARM64::Instruction` class to reset place holder flag if the virtual register is actually used by an instruction.
- Existing assert expanded in `MM_ParallelDispatcher::workerEntryPoint` to cover task dispatch during shutdown - Asserting more strictly with new assert in added else clause
-Added static helper for each MemoryReference constructor -Renamed/refactored withDisplacement and withLabel helpers to createWithDisplacement and createWithLabel to match naming convention of other helpers
Add a function to generate DF flag check in JITCODE and trap when DF is incorrectly set. Downstream project can add the check to any places they want.
The motivation of this check is to catch a case where memory can be corrupted due to incorrect DF flag affecting `rep` instructions. One case is found in Openj9 but the memory corruption is hard to reproduce and not all memory corruption can result in a test failure or crash.
However a wrong DF flag should be easier to detect if the check is executed frequently enough.
Signed-off-by: Liqun Liu <liqunl@ca.ibm.com> (commit: 5ba456b)
There are some switch statements on enum classes in ControlFlowEvaluator.cpp that were previously missing default cases. While these switches are exhaustive, the way enum classes work in C++ means that technically any int can be used, so default cases are needed to make sure that such errors are caught.
Signed-off-by: Ben Thomas <ben@benthomas.ca> (commit: d1e6b9e)
Replace MemoryReference constructors with static helpers
All instances where MemoryReference objects were created using a direct call to a MemoryReference constructor were replaced with a call to the corresponding static helper method
Remove and replace legacy MemoryReference constructor
Previously, there were two MemoryReference constructors that accepted a displacement as one of its parameters: one that took it as an int32_t (a legacy constructor) and one that took it as an int64_t. To ensure that a call to either one of these constructors was never ambiguous, an extra int parameter was added at the end of the int64_t constructor. With this commit, the legacy constructor has been removed and replaced with the int64_t one, and the extra int parameter has been removed
Change MemoryReference constructors from public to private/protected
Since MemoryReference constructors should no longer be called directly to create MemoryReference objects, they have been changed from public to private/protected
Add a flag to indicate codegen support for byteswaps
In order for language frontends to be able to make use of the new byteswap opcodes, they need a way to determine whether the codegen for the architecture they're running on supports these opcodes. To allow this, a new flag has been added to indicate codegen support for sbyteswap, ibyteswap, and lbyteswap.
Signed-off-by: Ben Thomas <ben@benthomas.ca> (commit: 6df85c1)
Previously, codegens could declare support for the ibyteswap opcode by overriding a virtual function called getSupportsIbyteswap(). It is now expected that codegens will implement support for all of the byteswap opcodes at the same time. Additionally, this query was never previously used, so it has now been removed.
Signed-off-by: Ben Thomas <ben@benthomas.ca> (commit: 7d68a30)
The sbyteswap and lbyteswap opcodes are now supported in the Power codegen using code taken from OpenJ9's handling of the recognized Short.reverseBytes and Long.reverseBytes methods.
Signed-off-by: Ben Thomas <ben@benthomas.ca> (commit: 8e2a3a7)
As it turns out, the x codegen already has the code necessary to support sbyteswap and lbyteswap in OMR. The tree evaluator tables have thus been updated to enable this support.
Signed-off-by: Ben Thomas <ben@benthomas.ca> (commit: 9cff620)
Previously, the Z codegen had the necessary code in OMR to support the sbyteswap, ibyteswap, and lbyteswap opcodes. However, this was not yet exposed to OMR consumers via implementing these opcodes. This code has now been enabled for implementing these opcodes.
Signed-off-by: Ben Thomas <ben@benthomas.ca> (commit: f5e75be)
Previously, the evaluators for byteswap on all architectures contained checks against the number of children on the node being evaluated. However, the IL validator should already be checking this, so these asserts are not needed.
Signed-off-by: Ben Thomas <ben@benthomas.ca> (commit: 338560d)
Check whether TLS storage for comp object was created before freeing it
When the JIT fails initialization it proceeds with the shutdown routines. In here the JIT tries to deallocate the TLS storage for the compilation object. However, because the JIT failed so early, this storage might not have been allocated in the first place. This PR checks whether the TLS storage was allocated before attempting to free it.
Signed-off-by: Marius Pirvu <mpirvu@ca.ibm.com> (commit: 98c6c84)
Add Dynamic Breadth First Scan Ordering to the GC implemented using JIT hot fields. This enables the copying of a hot field marked by the JIT immediately after the object containing the hot field is copied during a Scavenge.
Signed-off-by: Jonathan Oommen <jon.oommen@gmail.com> (commit: 66a8c67)
The assert is too restrictive. It has been in place for a very long time and I believe the situation it is trying to prevent is one where a register spill is inserted in between the call instruction and the VFP cleanup instruction (because the spill instruction uses the stack pointer and may get an incorrect offset). Restricting this special instruction's use to ICF regions would ensure that because (in theory) no spilling should be happening within an ICF. Hence the assert.
However, there are circumstances in carefully constructed call sequences where this instruction could be used outside of an internal control flow region where register assignment and spilling won't be a problem between the call instruction and the VFP cleanup instruction.
Like any instruction, when used properly in the right context this instruction will behave correctly.
A read only code cache cannot support recompilation, because new method bodies cannot be placed into a protected code cache. Hence this commit disables recompilation when read only code cache is enabled.
Disable side-effect guards under read only code cache
Side-effect guards can require runtime assumptions that have to patch the code, which is not allowed. Therefore, disable the generation of side-effect guards until such time as the runtime assumptions can patch a write-able data area, rather than the code cache.
Previously, the ZEROCHK evaluator in the Power codegen had a lot of special casing to try to perform comparisons directly into condition registers. Since functionality has now been added to perform this optimization in a more general manner, this code can now be replaced with a simple call to evaluateToConditionRegister.
Signed-off-by: Ben Thomas <ben@benthomas.ca> (commit: 7f0cab9)
Two functions are provided for asking whether a given code cache address resides in a read-only (RX) code cache or not. While their primary motivation is to be used at runtime they can be used at either compile-time or runtime.
Add pointer to owning metadata in sentinel assumption
This commit facilitates the ability to serialize runtime assumptions. Any assumptions that have to be deserialized (i.e. reified) is inserted into the RAT as well as a circular linked list dangled off of the metadata for the body - this is the paradigm used in OpenJ9 anyway. As such, this commit adds a field to the sentinel assumption to point to the owning metadata so that that information can also be serialized.
The API to serialize a runtime assumption should also take the owning metadata, as that information would be necessary during reification if the desire is to attach those runtime assumptions targetting a particular compiled body with a metadata structure associated with said compiled body.
Replace uses of TR::comp() with cheaper alternatives
The `TR::Compilation` object is sometimes fetched from TLS (via `TR::comp()`) when there are cheaper alternative means available (e.g., fetching it from a `TR::CodeGenerator` object). Replace some obvious candidates. Also, when the TLS version must be used, optimize its use to avoid repeated calls in the same method.
Introduce CodeGenerator constructors and initialize functions
* Create a new CodeGenerator constructor that takes a TR::Compilation parameter. The constructor hierarchy only contains member initialization lists.
A mirrored hierarchy was created to permit downstream projets to adapt to the hierarchy changes. The default constructors will eventually be deleted and replaced with private empty stubs.
This constructor will eventually be invoked via a static factory function.
* Create a CodeGenerator::initialize() method to be called after object construction to initialize the CodeGenerator object. The initialize() functions take the place of the original constructor bodies and must be invoked in the same order.
Fix the assert added in RegDepCopyRemoval to consider missing case where child of GlRegDeps is PassThrough with the regLoad child and both uses same register.
Fixes: eclipse/openj9#10630
Signed-off-by: Rahil Shah <rahil@ca.ibm.com> (commit: 7aacdcb)
- Attribute `stalltimems` is added to `gc-end` to report time spent in stalling (in ms). - New API `getStallTime()` introduced for `MM_GlobalGCStats` - Expanded `MM_CollectionStatistics` to `_stallTime` - New attribute is added in `gc-end` in schema. - Update `MM_VerboseHandlerOutput::getTagTemplateWithDuration()` to reflect the addition of `stalltimems`
Support Int64 Length Node within setmemoryEvaluator
setMemoryEvaluator generates a sequence of instructions that initializes a segment of memory of a specified length with a specified value. This commit enables Int64 Length Node support, while maintaining support for Int32.
It seems silly that by default all Relocations are considered External relocations when only those that are in the `ExternalRelocation` hierarchy should be considered as such.
Flip the default answer returned from `isExternalRelocation()` to be false, and have only the relocations in the `ExternalRelocation` hierarchy return true.
Add j9VMThreadTempSlotField symbol in Symbol Reference Table
j9VMThreadTempSlotFieldSymbol is a non-helper symbol that will provide a mechanism for the compiler to insert temporary information that the VM can use - such as the number of args when calling signature-polymorphic methods that are implemented in the VM as internal natives. The VM can use that information to locate items on the stack.
Signed-off-by: Nazim Bhuiyan <nubhuiyan@ibm.com> (commit: 7d72e3f)
AArch64: Add another method to kill temporary registers added to register dependency
Add variant of `stopUsingDepRegs` method to the register dependency class. The method takes iterators to the list of registers as arguments. The list contains registers we do not want to kill.
In Scavenger, when we clear stats for each GC cycle, we miss to clear 2 fields relevant for array splitting, causing the stats being cumulative since the beginning of the run.
Marking array splitting is ok.
Signed-off-by: Aleksandar Micic <amicic@ca.ibm.com> (commit: 98f94bd)
There are locations within OMR compiler which we write to the verbose log without acquiring the verbose log lock first. This patch addresses the problem.
Issue: #5154
Signed-off-by: Allan Manuba <manuba@ualberta.ca> (commit: 8e4bdfc)
AArch64: Change ConstantDataSnippet to use targetaddress2 for non-aconst node
Change ConstantDataSnippet to use targetaddress2 parameter to constructors of `TR_ExternalRelocation` class if the node is not `aconst` because some evaluators in downstream projects would use ConstantDataSnippet for loading address constant with non-aconst node.
The `_callNode` field in the call site may be `NULL` which was not being checked in code paths guarded by BCD tracing. There was a bug exposed due to recent changes to enable BCD tracing when `traceCG` is enabled in #5165. This change activated the BCD tracing code path which exposed the `NULL` dereference.
The BCD tracing here is actually not very useful because there are `heuristicTrace` statements immedaitely above which trace pretty much the same information. Thus the safest and easiest thing to do is to just remove these extra BCD traces.
Signed-off-by: Filip Jeremic <fjeremic@ca.ibm.com> (commit: c2df848)
Previously, OpenJ9 had a number of opcodes for performing byteswapped loads and stores implementation details of which have leaked into OMR. Since OMR now supports byteswap opcodes for all of the types for which these reverse loads/stores were implemented, these opcodes are being removed.
Closes: #5149 Signed-off-by: Ben Thomas <ben@benthomas.ca> (commit: ba239bc)
Update double map API to allow to reserve contiguous address at a FIXED address.
Add release double map region API to free memory associated to double map accordingly
Include extra test to test new version of double map API.
Refactor double map API to generalize methods and variables names. Discontiguous arraylets from balanced GC policy is one use case of double mapping API, but the API can be used for any consumer that wants to double map multiple regions of memory into a single contiguous region.
mmap(2) in OSX is limitted where it won't allow a mapping that's backed by a file (have a file descriptor) and huge pages to work together. That's because huge pages flag is specified using the file descriptor field; therefore, we cannot have double mapping, which requires the heap to be backed by a file descriptor, and huge pages since both use the same field. In such scenario, if huge pages is available and double map is requested, huge pages takes precedence over double mapping.
Signed-off-by: Igor Braga <higorb1@gmail.com> (commit: 5b3b915)
Replacing isAbstractClass() and isInterfaceClass() with isConcreteClass()
isConcreteClass() checks to see if the class is an interface or Abstract class.This will reduce overhead resulting from calling two queries. issue: #11029
Signed-off-by: Eman Elsabban <eman.elsaban1@gmail.com> (commit: 50d4fc1)
This commit fixes `TR::ARM64TestBitBranchInstruction`. - Support branching to OOLCodeSection. - Use correct relocation class as the destination is encoded in imm14 field.
Previously, the Power codegen contained an optimization for performing a store of a byteswapped value in a single instruction. Unfortunately, the initial implementation of this optimization did not correctly check that the byteswap node in question was not commoned, which could result in erratic behaviour. This has now been corrected.
Issue: #5642 Signed-off-by: Ben Thomas <ben@benthomas.ca> (commit: e93399a)
Add API to get the sentinel Runtime Assumpion associated with the current Runtime Assumption
Since the sentinel assumption has a field to point to some metadata associated with the same body that the list of assumptions is, it would be useful to have the ability to retrieve the sentinel assumption given a regular assumption.
Fix the badIlOp node generated by RegDepCopyRemoval
For creating a fresh copy for RegDepCopyRemoval optimization, there was a case when a regLoad is under PassThrough node in which case we would not have corresponding regStore. To handle this case we were creating a regStore by getting the type from PassThrough node which would cause badIlOp. This commit fixes that issue.
Signed-off-by: Rahil Shah <rahil@ca.ibm.com> (commit: a19ea6a)
As it turns out, the previous code for performing a byteswapped store of a long on ppc32 was broken and would end up creating a memory reference with both an index register and an offset. This has been corrected.
Signed-off-by: Ben Thomas <ben@benthomas.ca> (commit: 9b282a1)
Previously, the codegen optimization for performing an lbyteswap and an lstore in one stdbrx instruction was not properly checking that the stdbrx instruction was actually available on the current processor. Since this instruction was only added in P7, this could result in P6 and earlier processors encountering an illegal instruction.
Signed-off-by: Ben Thomas <ben@benthomas.ca> (commit: b022576)
Common RealRegister field in RegisterDependencyStruct
Hoist the TR::RealRegister::RegNum field from each architecture into the base class so that a single implementation can be shared. Move the getter and setter as well.
Delete architecture-specific RegisterDependencyStructs that are left empty because of this.
Replace reg dep RealRegister property checks with meaningful queries on x86
Register dependencies rely on "special" RealRegister values to convey other properties about the register dependency, such as whether a byte register must be assigned, or whether any assignable register is suitable for this dependency. The long-term goal is to not rely on such special values in the RealRegister enum to convey these properties. To get there, this commit replaces direct tests of the real register for queries on the register dependency itself for the property in question.
Replace reg dep RealRegister property checks with meaningful queries on Power
Register dependencies rely on "special" RealRegister values to convey other properties about the register dependency, such as whether a byte register must be assigned, or whether any assignable register is suitable for this dependency. The long-term goal is to not rely on such special values in the RealRegister enum to convey these properties. To get there, this commit replaces direct tests of the real register for queries on the register dependency itself for the property in question.
Replace reg dep RealRegister property checks with meaningful queries on AArch64
Register dependencies rely on "special" RealRegister values to convey other properties about the register dependency, such as whether a byte register must be assigned, or whether any assignable register is suitable for this dependency. The long-term goal is to not rely on such special values in the RealRegister enum to convey these properties. To get there, this commit replaces direct tests of the real register for queries on the register dependency itself for the property in question.
Replace reg dep RealRegister property checks with meaningful queries on RISC-V
Register dependencies rely on "special" RealRegister values to convey other properties about the register dependency, such as whether a byte register must be assigned, or whether any assignable register is suitable for this dependency. The long-term goal is to not rely on such special values in the RealRegister enum to convey these properties. To get there, this commit replaces direct tests of the real register for queries on the register dependency itself for the property in question.
Replace reg dep RealRegister property checks with meaningful queries on ARM
Register dependencies rely on "special" RealRegister values to convey other properties about the register dependency, such as whether a byte register must be assigned, or whether any assignable register is suitable for this dependency. The long-term goal is to not rely on such special values in the RealRegister enum to convey these properties. To get there, this commit replaces direct tests of the real register for queries on the register dependency itself for the property in question.
Modify some Z RegisterDependencyConditions APIs to accept a TR::RegisterDependency
In compiler/z/codegen/OMRRegisterDependency.cpp:
* Add alternate APIs for `addPreConditionIfNotAlreadyInserted()` and `addPostConditionIfNotAlreadyInserted()` that accept a `TR::RegisterDependency *` parameter.
* Modify the APIs for `doesPreConditionExist()`, `doesPostConditionExist()`, and `doesConditionExist()` to accept a `TR::RegisterDependency *` parameter. Eliminate the unused optional parameter `overwriteAssignAny` from all.
Replace reg dep RealRegister property checks with meaningful queries on Z
Register dependencies rely on "special" RealRegister values to convey other properties about the register dependency, such as whether a byte register must be assigned, or whether any assignable register is suitable for this dependency. The long-term goal is to not rely on such special values in the RealRegister enum to convey these properties. To get there, this commit replaces direct tests of the real register for queries on the register dependency itself for the property in question.
This commit also fixes a number of compiler warnings on Z where `RealRegister::RegNum` and `RealRegister::RegDep` enums were compared directly to test for these properties. Use a static_cast instead.
The `MayDefine` pseudo real register is never used. Remove it.
zos doesnt store the dwarf debug info in the target executable, but creates a .dbg file for every compilation unit. In order to collect these we glob for .dbg files residing under a directory specified by the DDR_OBJECT_SEARCH_ROOT property on the ddrset target (which defaults to the ddr set's binary dir). In addition to this paths to exclude from the globbing are specified in the DDR_OBJECT_EXCLUDES property
MSVC can't deal with the templates in this file. The templates are used to test various data types and their alignment requirements in a maintainable way. Exclude the file for for the time being.
Remove C++11-isms that old compilers don't support
Older MSVC versions don't support delegating constructors or alignof(). Use OMR_ALIGNOF() in place of alignof() and rework the CCData constructors to not require delegation.
Older XL doesn't support std::unique_ptr, manage array pointers directly.
Require address load instruction for statics in JITServer
In some cases, e.g. method enter/exit hooks, JITServer takes the static hook address directly from the client, but the address load might be encoded using offset relative to server's RIP, which results in an invalid address. This commit ensures that static address loads for out-of-process compilations always get loaded in a register first, preventing relative access.
Signed-off-by: Dmitry Ten <Dmitry.Ten@ibm.com> (commit: f580a3e)
Remove unnecessary uses of double-colon rules in makefiles
Double-colon rules have special semantics that allow multiple scripts to be specified (that are executed in the order they appear in the makefile). The uses modified by this have no scripts and so don't need to be written specially.
Signed-off-by: Keith W. Campbell <keithc@ca.ibm.com> (commit: 9626e94)
Improve selection criteria for IndVarEliminination
Add post dominance check for primary induction variable(PIV) and derived induction variable(DIV) to improve selection criteria for `redundantInductionVarElimination` optimization.
Previously when checking whether or not to replace a DIV with PIV one of the things we did was count the number of increments. We replaced DIV with PIV only if both had equal number of increments. However, when counting the increments we did not check if those increments were in the same block. Because of this we would sometimes replace a DIV even if it was in a different block than the PIV it was being replaced with. This resulted in addition of extra add and load instructions to compensate for the different block locations. Addition of those extra add and load instructions resulted in a performance regression.
To combat this regression, we have added domination relationship check for PIV and DIV. DIV will be replaced only if every increment of the DIV being replaced is post dominated by at least one PIV increment. This helps ensure we don't replace a DIV which needs extra instructions to load PIV.
This change is not enabled on x platform as the complex addressing modes mitigate the complexity of the expression used to eliminate the derived induction variable.
The post dominate check can be disabled by setting `TR_DisableIVEPostDominatorsCheck` environment variable
Use a default empty message when NULL passed to set error api
`omrerror_set_last_error_with_message` expects a non-null message though the API doesn't indicate that in the documentation.
Protect against the inadvertant passing of NULL by defaulting to an empty string for the message. This ensures that anyone checking for a message after this call won't get stale data.
Fixes segfaults when the NLS catalog can't be found as it's a common pattern to pass the NLS result directly to this call.
Signed-off-by: Dan Heidinga <heidinga@redhat.com> (commit: ed46fae)
This change looks like it has a lot of modifications but in reality the following was done:
- Move OMROpcodes.hpp into Opcodes.enum - Eliminate `FOR_EACH_OPCODE` macro to allow for extensibility - Whitespace formatting after removal of above - Rename `MACRO` to `OPCODE_MACRO` - Eliminate line breaks after each `OPCODE_MACRO` - Modify all uses of `FOR_EACH_OPCODE` to define `OPCODE_MACRO`, include the new Opcodes.enum and undefine `OPCODE_MACRO`
The way extensibility will work is similarly to the other extensible enums in that downstream projects will create their own Opcodes.enum file and simply include the OMR one, then add their own `OPCODE_MACRO` definitions.
OMR files which `#include "il/Opcodes.enum` will then automatically pick up the extended opcode list and produce the tables in question. There is some downstream work to be done, particularly in OpenJ9 following this change to prepare to remove a bunch more `J9_PROJECT_SPECIFIC` code in OMR since the tables are now properly extensible.
Signed-off-by: Filip Jeremic <fjeremic@ca.ibm.com> (commit: 93bc6ad)
Deprecate child types array in Tree.cpp in favor expectedChildType API
We already have an API `expectedChildType` which extracts the expected child type from the Opcodes.enum metadata array. Rather than using a manually maintained and error prone table, we simply extract the child type from one source of truth metadata table.
This eliminates the need to maintain yet another table manually when dealing with updating/adding/removing opcodes.
Signed-off-by: Filip Jeremic <fjeremic@ca.ibm.com> (commit: 3cf1194)
Move Opcodes.enum into OMROpcodes.enum to allow for extensibility
Downstream projects will need to include OMROpcodes.enum and extend the list inside their own Opcodes.enum. This is to ensure projects which do not add the root directory containing OMR to the IPATH are able to include OMR's own list of opcodes. Having OMROpcodes.enum in such cases prevents circular indludes and name clashes when trying to extend the opcodes list.
Signed-off-by: Filip Jeremic <fjeremic@ca.ibm.com> (commit: aa1b2b6)
Remove enumValue from Opcodes.enum in favor of opcode
The `enumValue` field of this metadata table serves no purpose really and it is a copy of the `opcode` field, sans the `TR::` suffix. Here we modify the `opcode` field to remove the `TR::` suffix and instead just add this suffix in the one location that needs it, i.e. within the compiler/il/OMRILOpCodesEnum.hpp file.
Signed-off-by: Filip Jeremic <fjeremic@ca.ibm.com> (commit: 4c7ba6c)
Add Dynamic Breadth First Scan Ordering to Balanced GC
Add a gc scavenger scan ordering feature that enables the copying of a hot field marked by the JIT immediately after the object containing the hot field is copied for balanced gc policy.
Signed-off-by: Jonathan Oommen <jon.oommen@gmail.com> (commit: ebc1cc3)
Take advantage of the Opcode.enum macro generator to generate the VP handlers table. Some parts still need to be guarded by the project specific ifdef until we make the VP table into a proper extensible enum. This requires some boilerplate code which will be done after we have everything working.
Signed-off-by: Filip Jeremic <fjeremic@ca.ibm.com> (commit: e4e8c95)
Prepare simplifier tables for downstream extension
- Move the handlers into a .enum file - Generate the handlers using the Opcode.enum opcode macros - Remove OpenJ9 project specific handlers since they are now automatically generated
Signed-off-by: Filip Jeremic <fjeremic@ca.ibm.com> (commit: 60cf435)
Without '/fo', rc writes the *.res file in the same directory as the *.rc file. Some versions of make form the command as rc {options} ./win32/omrsyslogmessages.rc which yields the *.res file in the win32 directory which then won't be found at the link step.
Signed-off-by: Keith W. Campbell <keithc@ca.ibm.com> (commit: d3a0bb3)
NULL ptr field `_postDominators` to avoid scope issues in future
Pointer field `_postDominator` gets initialized to a local variable. As of right now all access points of the variable are guarded by NULL checks. Setting it to NULL before we exit the method to avoid bugs which may arise from accidental use of the variable without a NULL check.
Set _compressObjectReferences if mixed build override not defined
`_compressObjectReferences` should only be defined if `OMR_OVERRIDE_COMPRESS_OBJECT_REFERENCES` is not defined. Otherwise, when mixed mode is enabled and the override is defined, `_compressObjectReferences` is considered a private field that is declared, but never used, which the compiler may complain about.
Signed-off-by: Sharon Wang <sharon-wang-cpsc@outlook.com> (commit: 61a4993)
Use TR_BitVector instead of CS2::ABitVector in TR_InterferenceGraph
Some of the largest symbols in the compiler obj files originates from TR_InterferenceGraph resulting from its use of CS2::ABitVector, while providing no real benefits in terms of capabilities or performance. Use of CS2::ABitVector has been replaced with TR_BitVector in TR_InterferenceGraph, which results in significant reduction in the size of the Compiler component's obj files, whether it is part of a standalone OMR build or part of a downstream project.
Signed-off-by: Nazim Bhuiyan <nubhuiyan@ibm.com> (commit: 6d806b9)
In order to achieve this, following changes have been made:
- Relocate `formatAndOutput(),formatAndOutputV()` to `MM_VerboseBuffer` from `MM_VerboseWriterChain` - In `MM_VerboseHandlerOutput`: - Modified `writeVmArgs` to take a `MM_VerboseBuffer` - Factored out lines for printing `Initialized` stanza in `handleInitialized` - Created `outputInitializedStanza` dedicated to output `Initialized` stanza, which also takes a `MM_VerboseBuffer` - In `outputInitializedStanza`, replaced `handleInitializedInnerStanzas` with `outputInitializedInnerStanza` - In `VerboseHandlerOutput`: - Modified `writeVmArgs` to takes a `MM_VerboseBuffer` - Implement virtual method `outputInitializedInnerStanza()` - Implement `MM_VerboseManager::getVerboseHandlerOutput()` - In `VerboseWriterChain`: - Delete `formatAndOutputV()` - Redirect `formatAndOutput()` to `MM_VerboseBuffer` - Implement `getBuffer()` - In `MM_VerboseWriterFileLoggingBuffered` and `VerboseWriterFileLoggingSynchronous`: - Print `Initialized` stanza on the second file opens (First file is handled by `TRIGGER_J9HOOK_MM_OMR_INITIALIZED`) - In `MM_VerboseWriterFileLogging::endOfCycle`, opens a file right after last file closes to make sure a new `Initialized` stanza is printed in `openFile(...)`. - Changes to generalize handleIniaitzed to work on any provided buffer rather than writer chain specific buffer. This is done to bypass the writer chain with a new buffer during openFile to print stanza specifically for the file being opened.
Temporarily disable Power reverse load/store optimizations
Previously, stores/loads with byteswaps were generating optimized sequences in the Power codegen. Unfortunately, in some cases this support required the use of delayed indexed-form, which has been found to be broken. In order to temporarily correct issues arising from these optimizations, they have been disabled until the underlying issue is corrected.
Issue: #5684 Signed-off-by: Ben Thomas <ben@benthomas.ca> (commit: 24f32c2)
GCC <5 advertises C++11 support (i.e. __cplusplus >= 201103L) but doesn't have std::is_trivially_copyable. It does have an extension __has_trivial_copy that behaves much the same way.
This patch introduces a macro OMR_IS_TRIVIALLY_COPYABLE that substitutes the former for the later under GCC <5.
We have to support building in environments that don't support <type_traits> at all, or only support some of the features we need. Complicating matters is the fact that not all features have standard test macros, nor can we rely on the code being built with CMake (which would allow us to detect compiler support for specific features at build-config time). Instead we'll rely on the __cplusplus macro to detect C++11 and assume that if the compiler supports C++11 it will have most of the traits we need.
An exception to the above is GCC<5, which claims C++11 support but doesn't have is_trivially_copyable. For that case, use the OMR_IS_TRIVIALLY_COPYABLE macro instead, which falls back to a GCC extension where necessary.
Depending on the internal unit size and alignment, some storage buffers may not be large enough to hold any data at all. To avoid this we shouldn't create really small buffers.
Casting to/from data_t* (for non-char data_t) simplifies pointer arithmetic but requires reinterpret_casts and has potential to violate aliasing rules. This patch replaces such casts with manual calculations which allows for some reinterpret_casts to be converted to safer static_casts.
We can also avoid reinterpret_cast when constructing keys because keys are now guaranteed to be stored as a sequence of bytes.
Prevent MVC reduction in astoreEvaluator if symref is unresolved
The `astoreEvaluator`, more precisely the `astoreHelper` has a path in which it is trying to identify the following tree pattern:
``` astore <x> aload <y> ```
and tries to generate an `MVC` instruction to perform the store. The problem with this is that it seems to generate a memory reference from the `node` only for the purposes of testing whether it has an index register (which it by the way cannot know until the memory reference is used within an instruction). The act of generating a memory reference is not side-effect free as the symref could have been unresolved and as such various metadata can be generated.
The `astoreHelper` then goes ahead and creates another memory reference from the node and uses that for the `MVC` instruction. This means the metadata for handling unresolved symrefs could have been generated twice.
Similarly to `directMemoryStoreHelper` we simply prevent this `MVC` reduction if the symref is unresolved.
Signed-off-by: Filip Jeremic <fjeremic@ca.ibm.com> (commit: 81f8f75)
Swap dequeue and decrement in Scavenger scan queue
Currently, we decrement the counter of non-empty sub-queues after we remove an element from a sub-queue. In a corner case, when this is the last element of the last non-empty queue, for a moment there will be effectively no elements in the queue, but the counter will not be zero. If the GC thread doing dequeue is stalled (by OS due to overloaded CPUs) at that moment, other worker GC threads may spin in a higher level scan loop trying to dequeue a (non-existent) element from seemingly non-empty queue. The spinning takes CPU resources and does not help the stalled thread resume.
By re-ordering these 2 steps, if a thread stalls between them, the queue will appear as empty before the dequeue, what will make other GC threads quickly block (although they may shortly spin on spinlock of the that non-empty sub-queue). That in turn will release CPU resources and give a chance to the stalled thread to resume running.
This new order of operation in dequeue will now be a mirrored order of what occurs in enqueue steps (first add element and then increment), what indeed makes sense.
The explained problematic scenario is extremely rare, and this change for the most part is not expected to change Scavenge behavior.
Signed-off-by: Aleksandar Micic <amicic@ca.ibm.com> (commit: 58f212d)
This is a fix for compile error in PR: https://github.com/eclipse/omr/pull/5607
Previous PR removed handleInitializedRegion() and writeVmArgs() prematurely. Fixed by adding empty implementation for these two methods. These empty methods are temporary, which will be removed by time downstream projects (OpenJ9) starting using new API.
SupportsAlignedAccessOnly flag is no longer set for 64 bit Power 9. SupportsAlignedAccessOnly flag is no longer set for 64 bit BE Power 8. SupportsAlignedAccessOnly flag is now set for 32 bit Power 7.
This is done to more accurately represent which cases have performance issues with unaligned accesses.
Added attribute `contextid` to `concurrent-collection-start` stanza. Changed name of `concurrent-collection-start` stanza to `conurrent-global-final` to show its relation to `cycle-end` stanza which follows it. Removed `concurrent-collection-end` to reduce redundancy with `exclusive` stanza that follows it. Overloaded `MM_VerboseHandlerOutput::getTagTemplate()` to take `contextid`.
Add NoReg dependencies for Power two-reg form fixedSeqMemAccess
This fix ensure no register spills occurring in the middle of the fixedSeqMemAccess which will break JIT/AOT address patching code as it assumes specific relative instructions addresses.
Failing to default initialize these fields to 0 when a new instruction is generated can result in an ignored byte in the instruction having a non-zero value. This can cause bugs in the future when the byte may no longer be ignored.
AArch64: Unset hasResumableTrapHandler if TR_DisableTrap is set
Change initializer of `OMRCodeGenerator` not to set `hasResumableTrapHandler` if `TR_DisableTrap` is set. Set `_numberBytesReadInaccessible` and `_numberBytesWriteInaccessible` to 4096 if `TR_DisableTrap` is not set.
This commit replaces `TR_RISCV_?TYPE()` macros with (inline) functions and provide overloaded versions for easier use. This provides a nicer instruction encoding interface for encoding snippets.
Signed-off-by: Jan Vrany <jan.vrany@fit.cvut.cz> (commit: d5418ee)
Add preProcess stage to recognized call transformer
Some transformation require additional data or the trees to be preprocessed, so add a preProcess extension point to be implemented by downstream projects.
Also dump trees before and after the transformation when trace is on.
Signed-off-by: Liqun Liu <liqunl@ca.ibm.com> (commit: bb49727)
AArch64: Add a method for setting up implicit exception point to InstructionDelegate
Add `setupImplicitNullPointerException` method to InstructionDelegate class and change instruction classes with memory reference to call it from their constructor.
This documentation gives a high level structure of the new inliner.
The points of the documentation - Different components of the new inliner and their responsibilities. - The relationship between different components of this new inliner . - Parts need language specific implementations.
Add the helper classes for abstract interpretation
Including: - AbsValue.cpp. The abstract representation of a 'value'. - AbsOpStack.cpp. The abstract representation of an operand stack. - AbsOpArray.cpp. The abstract representation of an operand array.
Common parts for testing compiler/optimizers are extracted out. Those parts are put into a new class called CompilerUnitTest.hpp. From now on, CompilerUnitTest.hpp can be used as a base class for compiler unit tests.
Contribution to OMR build system for shared cache configuration
This commit introduces a top-level directory for Shared Cache. The files mentioned in sharedcache/CMakeLists.txt will be moved in later PR's. Introduction of OMR_SHARED_CACHE as a top-level directory flag, necessitated changes to multiple configure/mk/in files.
Also included in this commit are:
* additions to the CMake build system needed to build the shared cache, and * additions to TRMemory.(hpp/cpp) and omrmemcategories.h for shared cache object types and memory categories
Issue: #4610
Co-authored-by: Mark Thom <markjordanthom@gmail.com> Co-authored-by: Damian Diago D'monte <damian.dmonte@unb.ca>
Add TR::MemoryReference::create API with trivial implementation
This API is meant to be overridden in downstream projects who wish to have custom logic for generating memory references. Typical examples of this is for handling unresolved memory references.
Signed-off-by: Filip Jeremic <fjeremic@ca.ibm.com> (commit: 2ebbeb3)
Use the new `TR::MemoryReference::create` to generate memory references from nodes. This API can be overridden by downstream projects for custom logic.
Signed-off-by: Filip Jeremic <fjeremic@ca.ibm.com> (commit: 8793338)
API additions expose details about acquired monitors such as owner, number of times acquired, and information about waiting threads. May be useful for runtimes that require specific information about the state of acquired monitors such in Eclipse OpenJ9 snapshot+restore.
This commit improves CMake toolchain file to work with both, the "official" RISC-V GNU Compiler Toolchain and Debian (Ubuntu) provided RISC-V toolchains.
Signed-off-by: Jan Vrany <jan.vrany@fit.cvut.cz> (commit: c53240e)
Remove redundant entries from TR_S390LinkageConventions
Now that this enum is removed, the redundant entries introduced by this enum can be removed. This includes TR_JavaHelper and TR_JavaPrivate which can be collapsed into TR_Helper and TR_Private.
Add support for relo records for block freqeuncy and recomp queued flag
In OpenJ9 the JProfiling framework is used to create profiled code which contains references to internal data structures like block frequencies stored in `TR_BlockFrequencInfo` class, or the flag for checking if the method is already in recompilation queue. To support AOT compilation for such code, we need to generate relocation records for such data structures referenced by the compiled code. OpenJ9 PR https://github.com/eclipse/openj9/pull/9591 creates the necessary structures representing the relocation record for this data. This commit adds the changes required by the OpenJ9 PR by adding required enum for new relocation records and providing API for tagging the symbols appropriately.
These fields contain the information that is needed to generate relocation records for inlined methods; this is necessary in the scenario where not every inlined method will have a unique guard.
The existing incInlineDepth API does not contain the right paramters that are needed to store information that will be used to generate relocation records for the inlined call.
Add Breakpoint Guard to list of known types for AOT
Breakpoint Guards are used to ensure that if a breakpoint is set on an inlined method, it is reported to the runtime if the method is executed. In order to facilate enabling this support for relocatable compilation (currently only in Eclipse OpenJ9), add TR_BreakPointGuard to the list of AOT Guard Sites.
Add more performant in-reg byte-reverse series of instr for P8 & P9
As rlwimi became cheaper in P8 & P9, using it for in-register byte-reverse instruction sequence results in less total instructions and better performance
AArch64: Use b.cond instead of cbz/cbnz when all register are used up
`ificmpeq` and its variant nodes have 3 children and the third child is `GlRegDeps` whose children are `aRegLoad`/`iRegLoad`/`PassThrough` of global registers. In some cases, `GlRegDeps` node uses all the allocatable integer registers. If this happens, we cannot use `cbz`/`cbnz` instruction for branch because `cbz`/`cbnz` requires a register in addition to those global registers. This commit changes `ificmpHelper` to use `b.cond` instruction for that case.
New APIs were introduced in #5607 and made follwoing APIs obsolete: `MM_VerboseHandlerOutput::writeVmArgs` `MM_VerboseHandlerOutput::handleInitializedRegion`
Add field to adjust compact to contract minimum contraction ratio
This is intended to allow a new command-line option to adjust the ratio of high address free tenure space to requested contraction size which causes an extra compaction. A higher ratio results in more performance-heavy compactions and allows tenure to contract more.
Signed-off-by: Jason Hall <jasonhal@ca.ibm.com> (commit: f7f354f)
Fix 0 getContractionSize() in MemorySubSpaceGenerational
Fixes a bug in Gencon where an incorrectly returned 0 contraction size prevents additional compaction to aid contraction from occuring. The generational memory subspace was returining its own unused field rather than the correct contraction size of the actual sub space.
Signed-off-by: Jason Hall <jasonhal@ca.ibm.com> (commit: c6f0224)
Allow tenure to contract beyond LOA size when LOA is empty
With an LOA, tenure contraction was previously limited to only contracting within the LOA. This allows tenure to contract further when the LOA is empty. This is accomplished by contracting both the LOA and SOA simultaneously, then rebalancing the LOA size.
Signed-off-by: Jason Hall <jasonhal@ca.ibm.com> (commit: 55c525e)
Refactor prex arg info computation and propagation
Part of an effort to refactor prex arg info. This commit contains the following changes:
- Move two TR_PrexArgInfo functions from OpenJ9 to OMR such that we can merge prex arg infos in OMR. - When creating prex arg info for a target, propagate arg info from call site. - Add an inliner util to clear arg info for variant argument given a prex arg info - `computePrexInfo` will not assume storing the result to `_prexArgInfo` of a target, instead, the user will take the result and store it to the place they want.
Part of https://github.com/eclipse/openj9/issues/11584
Signed-off-by: Liqun Liu <liqunl@ca.ibm.com> (commit: 71d22eb)
Update signalExtendedTest re synchronous signals unblock with SIGSEGV
Given that synchronous signals gets unblocked when used the test got updated to ensure signal mask before testing.
Test signal for masked thread pthread_kill test uses SIGSEGV instead of SIGILL which is used for SIG_MASK inheritance test and to keep it consistent with the unmasked thread pthread_kill test.
Simplify field names and remove unnecessary routine
Perform the following cleanup: - Rename explicitLinkageType to linkageType - Add enum entries in OMR and remove magic constant - Remove isOSLinkageType routine
In `hasOldExpressionOnRhs()`, we temporarily change wrtbar to aloadi for the syntactic comparison in `areSyntacticallyEquivalent()`. When it’s an indirect store, the number of children of the node is set to 1, otherwise 0. Because the check on whether not the node is an indirect store happens after the wrtbar node is recreated as aloadi, `node->getOpCode().isStoreIndirect()` is false for aloadi, and the number of children for wrtbar node ends up as 0. It prevents `areSyntacticallyEquivalent()` from comparing the first child of the two nodes. It concludes that different expressions as the same. The fix is to check if it’s an indirect store from the original node.
Fixes eclipse/openj9#11569
Signed-off-by: Annabelle Huo <Annabelle.Huo@ibm.com> (commit: 18b7e5d)
The b2iEvaluator currently does not generate sign extension for bRegLoad. While it will work for sign extended bRegLoad, it may yield wrong result if the byte was not sign extended. This commit cover such cases, by making sure bRegLoad is always sign extended.
Fixes: eclipse/openj9#11192 Signed-off-by: Mohammad Nazmul Alam <mohammad.nazmul.alam@ibm.com> (commit: 5ae1d2b)
Declaring the functions inlined allows the compiler skip emitting callable symbols for them in `CCData.o`, but because the functions are declared in a header that's included in other compilation units the linker complains that there are no implementations for said functions available, even though they're declared private and not referenced outside of `CCData.o`.
Unfortunately said functions need to be declared as class members because they refer to private typedefs in the CCData class.
Create relo records of type TR_BlockFrequency and TR_RecompQueuedFlag
In OpenJ9 the JProfiling framework is used to create profiled code which contains references to internal data structures like block frequencies stored in `TR_BlockFrequencInfo` class, or the flag for checking if the method is already in recompilation queue. To support AOT compilation for such code, we need to generate relocation records for such data structures referenced by the compiled code. This commit adds the code for generating the relocation records for block frequency and the recompilation queued flag. Currently it only handles x and p architectures.
Convert `concurrent-mark-end` to `gc-op` in GMP https://github.com/eclipse/openj9/issues/11239
- Replace `concurrent-mark-end` in `concurrent-end` with `gc-op` - moved `terminationReason` as one attribute to `concurrent-end` stanza for both concurrent marking and concurrent scavenging - delete warning for reporting termination reason in scavenging (rare case) - report concurrent marking duration `timems` in gc-op - Add new API to get termination reason `MM_VerboseHandlerOutput::getReasonForTermination` - Consolidate `handleConcurrentGCOpEnd` into `handleConcurrentEndInternal`
This commit removes OMR::VMEnv::heapBaseAddress() as this routine always returns 0. It was introduced in the codebase early on in the codebase and has since been superseded by a shift-based solution. Hence it no longer needed.
@andrewcraik has retired from his position as a project committer and has requested to be removed as a code owner. He no longer has a valid ECA so I am committing the change for him.
Thanks for all your contributions and for serving as a project committer, Andrew!
Disallow prefetch insertion when read barriers are necessary
Disallow prefetch insertion when read barriers are necessary because this optimization may insert loads of array elements past the 0th element for a backwards traversal. That is, if the primary induction variable is traversing the array backwards, on the last iteration of the loop we will be prefetching the (i - 1)th element of the array. Since the prefetch needs to load such an element, we must ensure the read barrier will not trigger on such a value which may "look like" an object. For example today, the (i - 1)th element is really the last word of the array header, which is the `dataAddr` pointer which looks like an object, but it is not. Thus the read barrier may incorrectly trigger on such a value.
In theory the same issue can happen on a forward traversal since there may be padding bytes past the end of an array. For this reason we go with the safest route and just disable the entire prefetch insertion optimization if read barriers are necessary.
Signed-off-by: Filip Jeremic <fjeremic@ca.ibm.com> (commit: 6fef007)
Fix gencon to work with flattened arrays Implemented fix to allow gencon to work with flattened arrays. Currently does not support partitioning for multithreaded scanning
Enable jitPersistentAlloc/Free to use non-global persistent memory
Instead of using the global `::trPersistentMemory` object, `jitPersistentAlloc/Free` now use a new method `OMR::CompilerEnv::persistentMemory`. This does not change the behavior for OMR, but upstream projects can override `persistentMemory` method and return memory objects that are not the global one.
Signed-off-by: Dmitry Ten <Dmitry.Ten@ibm.com> (commit: cadb5ba)
This commit add support for running (cross-compiled) tests on host using emulator, specified by `OMR_TEST_LAUNCHER` CMake variable.
To make this work, this commit introduces new function - `omr_add_test()` that behaves just like CMake's `add_test()` but runs the command under simulator if needed.
Signed-off-by: Jan Vrany <jan.vrany@fit.cvut.cz> (commit: 68f2034)
Removing the unused implementation of applyUserOptions. applyUserOptions was implemented only on Power so that on startup it was possible to disable P10 support when TR_EnableExperimentalPower10Support is not set. applyUserOptions was ultimately implemented but never used. Since the method is not implemented on any other platform, it should be removed from the class.
SIGABRT only has one global signal handler, so we cannot guard function calls against SIGABRT using the port library APIs. Raising a SIGTRAP is useful for downstream projects who may want to catch such signals for compilation thread crashes and requeue such compilations (for another attempt, or perhaps to generate additional diagnostic data).
Signed-off-by: Filip Jeremic <fjeremic@ca.ibm.com> (commit: 5a20f67)
Update pointer offset addition for dbf Scan Ordering
Update pointer offset addition for adding a hot field offset to an object pointer in order to copy the hot field of the object for dbf Scan Ordering in gencon GC.
Signed-off-by: Jonathan Oommen <jon.oommen@gmail.com> (commit: 32d447e)
Create a DIE for a tag that dwarfdump-classic doesn't understand
dwarfdump-classic is an old tool that has not been updated to support current versions of clang on macOS. We don't need to recognize the DWARF version 5 concepts used by newer compilers, we can just ignore them.
Signed-off-by: Keith W. Campbell <keithc@ca.ibm.com> (commit: 6c50f7c)
- Move handling of -q64 flag to ensure it gets properly set regardless of scoping
- add -qnosearch and -I/usr/include/metal by default as standard c headers are unusable by metal-c
- Move port library metal-c files to new library to work around limitations on older versions of cmake
- Combine metal-c compilation rules. CMake does not seem to like having custom rules which look like generated rules. ie rules which depend on .c files and produce .o files
Improve Parallelization of Remembered Set Scanning in Gencon
Improve the parallelization of remembered set scanning in the final stop-the-world phase of the concurrent global GC cycle in Gencon. Remove old solution that used command line option -XXgc:dirtCardDuringRSScan.
Signed-off-by: Jonathan Oommen <jon.oommen@gmail.com> (commit: b429468)
AArch64: Generate memory barrier for store if symbol is shadow and ordered
This commit changes commonStoreEvaluator to generate memory barrier if the symbol of the node is `Shadow` and `Ordered`. The logic for deciding whether memory barrier is generated or not is borrowed from ppc codegen.
See eclipse/omr#5648 for discussion. It was decided during the OMR architecture meeting that this API is not very clear and that inlining it's uses is more understandable as it is very explicit as to what is being checked.
Signed-off-by: Filip Jeremic <fjeremic@ca.ibm.com> (commit: 7219168)
This API now has no uses and is really just checking whether reference count is 1, so there is no need for the API to exist because we already have `getReferenceCount`.
Signed-off-by: Filip Jeremic <fjeremic@ca.ibm.com> (commit: 234c0aa)
Using `printf` methods _(e.g `omrfile_printf` & `omrfilestream_printf`)_ to write to files is problematic when the output buffer contains a string with a specifier character (e.g %s). In such a case, the special character is intended to be be outputted raw to the file rather than evaluated/expanded. `printf` will attempt to evaluate the specifiers whereas the buffer must be printed raw. Currently, `printf` methods are used when outputting verbose initialized block (string).
When outputting the init block, use the verbose writer's `outputString` method rather than using - `omrfilestream_printf` for FileLoggingSynchronous Writer - `omrfile_printf` for FileLoggingBuffered Writer
`outputString` will use `omrfile_write_text` and just print the raw characters.
It turns out that the evaluator for s2l was incorrectly sign-extending its source register as an integer rather than as a short. This was previously masked by the fact that many evaluators would unnecessarily sign-extend their results. In general, the upper bits of a register containing a short are undefined, so it is incorrect to assume it has been sign-extended.
Signed-off-by: Ben Thomas <ben@benthomas.ca> (commit: eccafd9)
Fix s2i improperly clobbering its child's register
Previously, the Power codegen's implementation of the s2i opcode would attempt to reuse the child node's register if its reference count was 1. While this works most of the time, this causes issues if the child node itself reused a register from its child. In this case, the clobbered register may belong to a node whose reference count is not 1, which can result in incorrect behaviour.
In general, there is no reason to try and reuse the register of the child node in s2i, since an extsh instruction is always required. Doing so does not avoid any register shuffles or additional instructions. Accordingly, the s2i evaluator will now always allocate its own register.
Signed-off-by: Ben Thomas <ben@benthomas.ca> (commit: b1024d0)
Simplify Power integral narrowing conversion evaluators
The Power codegen's evaluators for performing integral narrowing conversions (e.g. i2s) have been significantly simplified to remove unneeded handling. Previously, some of these evaluators would perform sign- and zero- extension, which seems to be a remnant of a time when the upper bits of sub-integer values in registers were not considered to be undefined.
Signed-off-by: Ben Thomas <ben@benthomas.ca> (commit: 2076fd4)
Introduce an API for creating loads/stores based on nodes
Previously, evaluation of loads and stores based on nodes was generally handled by manually using TR::MemoryReference::createWithRootLoadOrStore and then performing operations on the resulting memory reference. However, this turns out to be wildly unsafe in the presence of volatile symrefs (it requires the caller to manually check) and also requires that MemoryReference be coupled to tree evaluation.
As a first step towards addressing this, a new API has been introduced that generates loads and stores based on a node. This new API has the requisite options to support existing optimizations that perform loads/stores with different opcodes to what the node might suggest while still hiding the details about TR::MemoryReference and volatile fences behind a unified interface.
Signed-off-by: Ben Thomas <ben@benthomas.ca> (commit: c475615)
Fix handling of non-volatile unresolved symrefs on ppc32
As it turns out, under certain corner cases we can generate unresolved symrefs that are not marked as volatile. In the event that such a symref is loaded as a long on ppc32, LoadStoreHandler would get confused and fail to use its handling for unresolved on such symrefs. Instead, it would generate two separate loads/stores, leaving an unused snippet that will cause a crash later. This has been corrected and LoadStoreHandler now checks for unresolved before checking volatility.
Signed-off-by: Ben Thomas <ben@benthomas.ca> (commit: 6bfb49b)
The purpose of this relo type is to relocate method pointers that already exist in the inlining table, as opposed to the TR_MethodPointer relo type that is used to relocate an arbitrary method pointer.
This commit is a temporary workaround for huge methods. The range of AArch64 conditional branch instruction is +/- 1MB. This commit gives up compilation when the size of the generated code exceeds that range.
Add baseReg dependency for POWER generatePairedStoreSequence
POWER generatePairedStoreSequence is missing baseReg dependency to ensure no RA register movement during the sequence. Adding baseReg condition to generatePairedLoadSequence dependency check to be consistent with generatePairedStoreSequence.
Add a new flag to GC heap stats and set it on percolate start for JNI Critical Region percolates. On GC cycle end, exclude the GC time from the GC CPU time if the flag is set.
The extra condition added to maximum heap heuristic checks causing no contraction when the computed heuristic was arbitrarily greater than 100 has also been removed.
Signed-off-by: Jason <jasonhal@ca.ibm.com> (commit: 7e9a012)
The optimization has O(n^2) complexity and is currently disabled for forward array traversals, and recently we've had to disable it under all array traversals for concurrent scavenge. As read barriers become more prominent, the introduction of the dataAddr pointer in the header will force us to further limit this optimization.
This optimization was originally introduced in 2010 for z196 z/Architecture processor to accelerate SPEC workloads. It showed minimal improvement (1.5%) according to historical data dug up prior to open sourcing the project. The hardware is hardly used today since it is so old. In recent iterations of the hardware, software prefetching has been discouraged due to advancements in hardware prefetching. Similar observations were seen when we removed prefetching from our zero memory routines.
Moreover, the phantom words past the end of the GC heap were removed several years ago, forcing us to disable this optimization for forward array traversals. As mentioned previously we have been encountering bugs in this optimization recently due to changes in the JVM and the fact that this optimization introduces loads on the heap which previously did not exist.
In the PR we detail performance numbers for several workloads where we believed this optimization may have provided a benefit. Across all platforms measured we no longer see a benefit (and sometimes a degradation) from running this optimization. Here we deprecate the optimization as we still have access to the original source code in the history if we ever need to resurrect this effort.
Signed-off-by: Filip Jeremic <fjeremic@ca.ibm.com> (commit: 46e59bf)
Update GC code to make sure whenever an indexable object moves, the object will always contain the correct dataAddr value, which is the address pointing right after array header.
Introduce pre/postObjectMoveForCompact wrapers for EnvironmentDeletegate pre/postObjectCompact API for object compaction fixup
This is phase 2 of https://github.com/eclipse/openj9/issues/11438
Signed-off-by: Igor Braga <higorb1@gmail.com> (commit: a51b0b0)
Access system class loader using a front-end query
Symbol Reference Table accessed the class loader directly, instead of going through the front-end query, which may lead to `TR_ASSERT_FATAL` triggered on JITServer.
Closes: #5837
Signed-off-by: Dmitry Ten <Dmitry.Ten@ibm.com> (commit: ff9ac31)
Get TR_OpaqueMethodBlock correct under relocatable compilations
The _methodInfo field of TR_InlinedCallSite actually holds a TR_ResolvedMethod under relocatable compilation. As such, checkOSRLimits was incorrect passing in the TR_ResolvedMethod to OSRFrameSizeInBytes. Tihs commit fixes this.
Introduce _CEE3164_ENVFILE as a temporary workaround
Due to LE limitation, we need to pass environment variables via _CEE_ENVFILE. Different launchers may unset _CEE_ENVFILE in order for child processes to pick up LIBPATH updates, we need to copy the _CEE3164_ENVFILE env var to _CEE_ENVFILE on first invocation of CEL4RO31.
While replacing with new node in RegDepCopyRemoval, we check if the passthrough node attached to the register dependency has corresponding regStore which is needed if if later optimization decides to split the blocks. Currently we were throwing assert while creating copyToNewVirtualRegister node under dependency where a global register used by passThrough is different than the global register used by the value node under it. As we are going to generate a regStore for that passthrough node, we do not need this strict assert.
Fixes: eclipse/openj9#11943
Signed-off-by: Rahil Shah <rahil@ca.ibm.com> (commit: 6653517)
Update GC code to make sure whenever an indexable object moves, the object will always contain the correct dataAddr value, which is the address pointing right after array header.
Introduce pre/postObjectMoveForCompact wrapers for EnvironmentDeletegate pre/postObjectCompact API for object compaction fixup
This is phase 2 of eclipse/openj9#11438
Same PR/commit as https://github.com/eclipse/omr/pull/5795 with concurrent scavenger fix. Previously we were updating dataAddr in indexable object headers inside Scavenger by calculating the object's layout based on the actual object. However, during concurrent Scavenger there can be scenarios where the object will be in the middle of being copied; therefore, destination object won't contain the proper header. Therefore we can find this information from the forwarded header.
Signed-off-by: Igor Braga <higorb1@gmail.com> (commit: 79ec7ca)
Previously, a buffer size of 100 characters was passed to sprintf in the block validator. However, the format string itself is 94 characters, which doesn't leave enough room in the buffer for the print arguments. As a result, running the block validator (e.g. with paranoidOptCheck) can lead to a buffer overflow.
This commit fixes the issue by increasing the size of the buffer to 150 characters.
**_For background and results see https://github.com/eclipse/omr/issues/5829_**
> Using overhead data (busy/stall times for managing and synchronizing threads), Adaptive Threading aims to identify sub-optimal/detrimental parallelism and continuously adjusts the GC thread count to reach optimal parallelism.
_Adaptive Threading changes are implemented at the various phases of GC as follows, during:_
1. **Pre-collection _(during to task/thread dispatch)_:** to adjust thread count based on the previous cycle’s recommendation 2. **Collection:** to gather data for parallelization overhead for the running cycle 3. **Post-collection _(after worker threads are suspended)_:** to project/calculate optimal thread count and give recommendation for the next cycle based on the current GC that’s completed
Adaptive threading will be enabled by default. The user may choose to enable/disable adaptive threading through the `-XX:[+-]AdaptiveThreading` options. However Adaptive Threading is ignored, even if it is enabled, when GC thread count is forced (e.g user specifics Xgcthreads). The user can also specify upper thread limit for adaptive threading using `-Xgcmaxthreads` option.
_The specifics of the implementation are as follows:_
- Thread count associated with a parallel task instance, indicates the projected optimal number of threads to complete the task. - Drives Adaptive Threading - changes with each dispatch of the task, based on observations made when previously completing the same task - Currently, adaptive threading is only used with _Scavenger_, hence this is only applies to tasks of type `MM_ParallelScavengeTask` so a recommended thread count is provided with a new `ParallelScavengeTask` instance (i.e when scavenge is run) - `getRecommendedWorkingThreads` introduced to `MM_Task` base class, base implementation returns `UDATA_MAX` (signifies no adaptive threading), overridden by `ParallelScavengeTask` to return adaptive thread count.
during collection): - ` _workerScavengeStartTime` and `_workerScavengeEndTime` - Thread local task (scavenge) start/end timestamps taken when worker thread starts/competes a task (scavenge). This is used to determine the time it takes a worker to start collection task from the time a cycle starts and how long a worker waits for others when it is completed its task. - `_adjustedSyncStallTime` - Similar to the existing `_SyncStallTime` stat **except** it accounts for Critical Section duration. That is, it subtracts critical section duration from stall time of synced thread. This is because the stall time being added from critical section is independent of number of the threads being synchronized. This independent stall time can not be adjusted for, hence it must be ignored. This is relevant for `syncAndRealeaseSingle/Master` APIs as they are the only APIs that record a stall time with critical sections, without these, `_SyncStallTime` == `_adjustedSyncStallTime` - existing `addToSyncStallTime` method extended to update `_adjustedSyncStallTime` in addition to `_SyncStallTime`. `addToSyncStallTime` now takes `criticalSectionDuration` (defaults to 0) as an input, a value for it passed when a critical section is executed prior to updating stall stats . - `_notifyStallTime ` - Used to record the stall times resulting from notifying other waiting threads. Note, this stat is not inclusive, it only records notify times relevant to adaptive threading. - We are more concerned about recording 'notify_all' rather than 'notify_one' as 'notify_all' is dependent on the number of threads being notified.
- Introduced `calculateRecommendedWorkingThreads` routine - Called at the end of each successful scavenge to project optimal threads for next scavenge - Implements Adaptive Model _(see background for more info)_ - Calculates averages of stall stats and uses them as inputs to adaptive model, projected optimal thread count output is stored to new member `_recommendedThreads`. - Timestamps taken for worker thread scavenge start/end time - `notifyStallTime` recorded for `notify_all` on `_scanCacheMonitor` _(some instances are ignored as they are not relevant to adaptive threading such as backout cases)_ - `MergeThreadGCStats` updated to merge new scavenger stats (discussed above) - trace point added here to breakdown thread local stats for adaptive threading
for Adaptive Threading - Dispatcher now queries the task for the recommended thread count when determining the number of threads to release from thread pool to complete the task - Ensures thread count is bounded properly to respect max thread count, either user provided adaptiveThreadCount or default max.
- `_syncCriticalSectionStartTime` & `_syncCriticalSectionDuration` introduced to record Critical Section times for adjusted stall time - `_syncCriticalSectionStartTime` recorded when all threads are synced and the released thread is about to exit sync api to execute critical section - `_syncCriticalSectionDuration` is recored when thread executing critical section is about to release the synced thread (indicating critical section is complete). As synced threads are released, they update their stall time with the newly set critical section duration. - `notifyStallTime` recorded for `notify_all` on synced threads
Refactor TR_AOTMethodInfo out of TR_InlinedCallSite
Previously, `TR_InlinedCallSite::_methodInfo` stored either a method pointer during a regular compilation, or `TR_AOTMethodInfo *` during an AOT compilation, despite `_methodInfo` having `TR_OpaqueMethodBlock *` type. This makes the code prone to subtle bugs like #5845.
This commit moves storage of `TR_AOTMethodInfo *` outside of `TR_InlinedCallSite`, putting it in `TR_InlinedCallSiteInfo` which allows to simplify the definition of `TR_InlinedCallSite` and places where it is used.
Closes: #5844
Signed-off-by: Dmitry Ten <Dmitry.Ten@ibm.com> (commit: 3533681)
AArch64: Fix genericBinaryEvaluator for internal pointer case
Avoid reusing source register when node is `aladd` because the target register can contain an internal pointer. Exception is when the source register also conatins an internal pointer and has the same pinning array pointer as the node.
Adjust cardCleaningThreshold for language specific kickoff
When transitioning from tracing only state to tracing and card cleaning state, account that pre-calculated remaining free memory for card cleaning kickoff threshold should be higher if global kickoff was due to reasons other than low free memory such a language specific kickoff (class unloading).
The adjustment for card cleaning kickoff threshold is exactly the same as the difference between actual free memory at the time of global kickoff and expected free memory global kickoff threshold.
Without the adjustment, card cleaning and any other states after (most importantly final STW phase where we reclaim memory) are delayed till (incorrectly very low) pre-calculated card cleaning threshold is met, effectively negating any efforts of the early language specific kickoff.
Signed-off-by: Aleksandar Micic <amicic@ca.ibm.com> (commit: 2394198)
This helps avoid unnecessary (duplicate) Global GC in case one thread is triggering final STW step of Concurrent global (due to exhausted concurrent work),and another thread triggering Scavenge that ends up percolating to Global.
Signed-off-by: Aleksandar Micic <amicic@ca.ibm.com> (commit: da66147)
Add flag to disable loadaddr specialization in x86 MemoryReferences
Presently in the x86 code generator, there is an optimization during MemoryReference creation that always assumes that accesses to local objects via a loadaddr should use the frame register, as opposed to evaluating (or re-using) the register associated with the loadaddr as the base register.
Add an environment variable `TR_useLoadAddrRegisterForLocalObjectMemRef` to disable this optimization for diagnostic purposes.
Move snippets-to-be-patched lists from Compilation to CodeGenerator
Migrate the following fields and getter functions from the Compilation class to CodeGenerator where they more naturally belong (they are used exclusively during the code generation phase):
Do not remove the Compilation code just yet as downstream consumer code also needs to be changed. Re-route Compilation functions to call the corresponding CodeGenerator function.
Compute Windows 10 build number in omrsysinfo_get_OS_version()
This fixes two problems. If _WIN32_WINNT_WIN10 isn't defined, which it isn't for VS2013, use deprecated GetVersionEx() to detect Windows 10. Also query the Windows 10 build number instead of hard coding an old build number.
AArch64: Add class unloading pic site to address constant
This commit changes `aconstEvaluator` to load address constant with `ConstantDataSnippet` if class unload assumption is required. It also changes `loadAddressConstantInSnippet` to add class unloading pic site to those snippets.
This commit adds AArch64 VFP instructions for vector add and bitwise inclusive or operations. Following data-types are covered by the vector add instructions- - VectorInt8 - VectorInt16 - VectorIntFloat - VectorDouble
Signed-off-by: Md. Alvee Noor <mnoor@unb.ca> (commit: 42d73ef)
This commit adds the following support for vector operations- - Quad-word vector load and store operations, and relevant binary encodings. - Handles vector registers for register dependency.
Signed-off-by: Md. Alvee Noor <mnoor@unb.ca> (commit: c842228)
AArch64: Add register related virtual method to ARM64MemInstruction
Add refsRegister/usesRegister/defsRegister/defsRealRegister/assignRegister to ARM64MemInstruction class and change child classes to call those methods.
This commit adds test-cases for the vector add implementation on relevant tril test- VectorTest for the following vector data-types- - VectorInt8 - VectorInt16 - VectorFloat
Signed-off-by: Md. Alvee Noor <mnoor@unb.ca> (commit: 7120a36)
This commit adds AArch64 VFP instructions for vector subtraction operations. Following data-types are covered by these instructions- - VectorInt8 - VectorInt16 - VectorFloat - VectorDouble
Signed-off-by: Md. Alvee Noor <mnoor@unb.ca> (commit: 036741e)
This symbol is intended to facilitate setting up computed calls for methods that are yet to be compiled. Also updated OMRNonHelperSymbols.enum with this symbol as well as j9VMThreadTempSlotFieldSymbol.
Signed-off-by: Nazim Bhuiyan <nubhuiyan@ibm.com> (commit: 8385ad2)
This commit adds test-cases for the vector subtraction tril tests in VectorTest for the following vector data-types- - VectorInt8 - VectorInt16 - VectorFloat - VectorDouble
Signed-off-by: Md. Alvee Noor <mnoor@unb.ca> (commit: fd15d9a)
TR_X86ProcessorInfo is currently used in OMR because the compiler does not yet have access to the portlib. Therefore, it has to determine the processor features itself. In order to not initialize TR_X86ProcessorInfo multiple times a flag is set. However, it is a race possible if there are multiple compilations, as is the case in Eclipse OpenJ9. Because the processor feature flags were copied from the target TR::Environment set in TR::Compilation object, if two racing compilations had different target environments, the feature flags could become inconsistent.
This commit initializes the processor info just once in OMR::CompilerEnv::initializeTargetEnvironment().
RISC-V: introduce RVLinkageProperties::initialize() to initialize derived members
Linkage properties have a number of members whose values can be derived from register flags. This commit introduces a new method initialize() that computes these values based on _registerFlags.
Linkages are expected to fill in register flags and then call initialize() to finish properties initialization.
Signed-off-by: Jan Vrany <jan.vrany@fit.cvut.cz> (commit: b194b27)
Ensure consistency between resolved method and symbol wrt isInterpreted
In a relocatable compilation, if the resolved method denotes an interpreted method, ensure that the resolved method symbol also denotes it is interpreted.
In POWER the peephole optimization for iushr, ishr uses rlwinm. However, the rlwinm does not take into consideration of sign extension that is required by iushr/ishr. This commit fix the case for ishr so that the optimization is only possible for correct case.
Signed-off-by: Mohammad Nazmul Alam <mohammad.nazmul.alam@ibm.com> (commit: 90229d2)
This commit adds AArch64 VFP instructions for vector multiplication & division operations. Following data-types are covered by these instructions- - VectorInt8 - VectorInt16 - VectorFloat - VectorDouble
Signed-off-by: Md. Alvee Noor <mnoor@unb.ca> (commit: 2f0a0e3)
This commit adds AArch64 implementations for vector multiplication & division operations. Following data-types are covered by these operations- - VectorInt8 - VectorInt16 - VectorFloat - VectorDouble
Signed-off-by: Md. Alvee Noor <mnoor@unb.ca> (commit: fdb9b28)
Define isArrayCompTypeValueType method for Value Propagation
This change adds a virtual method and base implementation of isArrayCompTypeValueType to OMR::ValuePropagation that can be used to determine whether the component type of an array is definitely, might be or is definitely not a value type.
The base implementation simply assumes any array's component type might be a value type if value type support is enabled, and that it is definitely not a value type otherwise.
Signed-off-by: Henry Zongaro <zongaro@ca.ibm.com> (commit: d429462)
The trigger point of the end of final STW increment of concurrent global GC event was mistakenly removed as part of VGC re-org, believing that Verbose GC was the only user of. However, global GC scheduling and compact heuristics indirectly also relied on it.
See: https://github.com/eclipse/omr/pull/5688
Signed-off-by: Aleksandar Micic <amicic@ca.ibm.com> (commit: 39dd3d8)
AArch64: Improve code generation for i2l-lshl sequence
This commit improves code generation for the IL sequence i2l-lshl that often appears in offset calculation for array access, by using sbfm instruction.
This commit adds test-cases for the vector multiplication & division tril tests in VectorTest for the following vector data-types- - VectorInt8 - VectorInt16 - VectorFloat - VectorDouble
Signed-off-by: Md. Alvee Noor <mnoor@unb.ca> (commit: 233e0f0)
In copy method we checked 2x if forwarding succeeded. It's needed for Concurrent Scavenger for 2 different paths: small objects (allowing duplicates) and large objects (disallowing duplicates).
These 2 if checks (and their associated then&else code blocks) were serialized, but now they are nested. It makes it somewhat easier to read, but more importantly more friendly for later code versioning for CS and nonCS case.
A small compromise is that handling of failed forwarding path now is duplicated at the level of source code (now at each end of those 'if's, while it used to be only once at the very end). Still, in run time, it's only handled once.
Signed-off-by: Aleksandar Micic amicic@ca.ibm.com (commit: 51cde66)
Fix paddi and lxvdsx for vsplatsEvaluator to have GPR register
P10 uses paadi in conjuction with lxvdsx for vsplats. The opcode paddi requires a GPR for correct address calculation as lxvdsx uses GPR for loading the data. The passed register for loadfloatConstant can be of type FPR/VSX thus leading to assert in RegisterAssigning phase. This commit ensures that paddi and lxvdsx uses GPR for the constant data snippet address.
Fixes: openj9#12383
Signed-off-by: Mohammad Nazmul Alam <mohammad.nazmul.alam@ibm.com> (commit: 96c559a)
AArch64: Use correct parameter for memory barrier instructions
Use correct parameter for `dmb` instrutions. - for shareability domain parameter, the correct one is `inner shareable domain`. - barrier required after volatile load is an acquire barrier. A full barrier is not required.
Check if current phase concurrent when trying to yield
When we consider yielding in Concurrent Scavenger (prematurely terminating scanning loop, while in concurrent phase, due to external Exclusive Access Request being raised), we can now rely on explicit flag that tells us we are really in concurrent phase. Yielding while in STW phases is meaningless.
Previously we checked if we were not in STW mode by checking that exclusive access was in 'pending' state, but exclusive VM access actually can be pending even state is already 'exclusive'. This is true when exclusive VM access was requested by external thread. In this case, we could incorrectly conclude we are not in concurrent mode, and never yield.
Signed-off-by: Aleksandar Micic <amicic@ca.ibm.com> (commit: 5c5e544)
The Eclipse Foundation no longer requires commits to be signed. Modify documentation and tooling to reflect that. Also, clean up some of the related wording in the CONTRIBUTING.md guide.
See https://www.eclipse.org/lists/eclipse.org-committers/msg01291.html (commit: 8d093fc)
Add AbsoluteHelperAddress Relocation for PPCSystemLinkage calls
AbsoluteHelperAddress relocation store and patch an absolute helper address in a series of instructions. This commit uses this relocation for calls with PPCSystemLinkage.
Use a human readable name for the object comparison non-helper
The object comparison non-helper is not recognized by TR_Debug::getName, so it's shown in tree listings as "unknown[#337 helper Method]". This change has its name reported as <objectEqualityComparison>. (commit: 526e63b)
* like omrstr_ftime(), with a flag for requesting UTC formatted time * support '%a' format specifier * simplify implementation for Windows * add new tests * clean up and remove what is now dead code
Signed-off-by: Keith W. Campbell <keithc@ca.ibm.com> (commit: 42b04a6)
Introduce `concurrent-start/end` stanza pair for Concurrent Global Marking to match with scavenging. Concurrent-start stanza is printed in initial STW phase as we transition away from init complete. Concurrent-end can be printed either before final STW (when we exhaust concurrent work) or during STW (if halted).
- New API in `MM_Collector`: - `notifyAcquireExclusiveVMAccess` to notify local collector - New API in `MM_ConcurrentGC`: - `preConcurrentInitializeStatsAndReport` for triggering concurrent-start event - `postConcurrentUpdateStatsAndReport` for triggering concurrent-end event - `notifyAcquireExclusiveVMAccess` to recored time stamp when concurrent marking is halted or completed in STW - Introduce a subclass `MM_ConcurrentMarkPhaseStats` of `MM_ConcurrentPhaseStatsBase` - New fields in `MM_ConcurrentPhaseStatsBase` - `_concurrentCycleType` - New API in `MM_VerboseHandlerOutput` - `getTagTemplate` to print with optional `reasonForTermination` - New API in `MM_VerboseHandlerOutputStandard` - `getConcurrentTypeString` - `getCardCleaningReasonString` - `handleGlobalMarkEndNoLock`
Copy the .x files to the ARCHIVE_OUTPUT_DIRECTORY to be consistent with the CMake model. However, we continue to copy the .x files to the RUNTIME_OUTPUT_DIRECTORY as well, to work around a bug in CMake which causes it to look for the .x files in the wrong directory.
In omrcel4ro31, add OMR_EBCDIC guards around two a2e calls to ensure the module and function name is EBCDIC irregardless of whether OMR is being built with EBCDIC or ASCII convlit.
Update CELQGIPB offset from CAA fast vector array to +96.
This API is unimplemented and calling it and using the result will likely produce a crash. This API is used in dead code paths:
1. DFP 2. arraycmpWithPad 3. pdneg 4. vsetlem
under certain circumstances. Please see the self review in the PR which is associated to this commit for explicit reasons as to why failing the compilation in each case is safe and warranted. (commit: 4a14a85)
This symbol represents the floatTemp1 field in j9vmthread. It will provide another mechanism for the compiler to insert temporary information at run-time that the VM can use, similar to how tempSlot is used. While the name suggests this field would contain floats, other data types could also be stored.
Signed-off-by: Nazim Bhuiyan <nubhuiyan@ibm.com> (commit: e8c0516)
This commit is the mean of the change to deprecate DFP related code. Here we remove all the opcodes and datatypes related to DFP along with cleaning up some intertwined APIs.
Disable x87 floating point for 32-bit code generation
The minimum target processor level for OMR code generation is Pentium 4, which has support for SSE2. All modern operating systems are assumed to support preserving the SSE state. Disable x87 code generation in preparation for removal.
This option is inconsistently used throughout the codebase. It was originally introduced when OOL code sequences were first implemented as a fallback path due to register allocation issues.
We have been using OOL seuqneces for almost a decade now and in some places we don't even have a guard or fallback path against using OOL, so enabling this option is not safe. Because of these reasons it is best to just remove it. (commit: 2f617f6)
The APIs for counting various statistics has been disabled by default since it is guarded by the `DEBUG` build time flag which we don't build with. This code is dead. Once we are further along unifying the register allocator across codegens we can come back and re-introduce these stats in a cross-platfrom way. (commit: a2c7728)
Remove isOutOfLineColdPath guards from doRegisterAssignment
Tracing of register allocation was previously guarded with the `isOutOfLineColdPath` path. However this is unnecessary since the start of the register allocation never begins in an OOL path. (commit: 24a5774)
Migrate TBEGIN register allocator code to S390SILInstruction
We override the `assignRegisters` API and migrate the code related to register allocation of `TBEGIN` and `TBEGINC` instructions to this overloaded virtual API. This further cleans up the register allocation code. (commit: 739caac)
Migrate DCB register allocation code to S390DebugCounterBumpInstruction
The code which looks for a free real register to associate with the DCB instruction can be migrated to the overloaded `assignRegisters` vritual API. (commit: 8c45dc1)
Migrate handleLoadWithRegRanges to OMRInstruction.cpp
We call this API at the end of `assignRegisters` instead of in `doRegisterAssignment` which eliminates a cross-platform incompatibility. (commit: 1506708)
Unify doRegisterAssignment on all platforms except x86
Now that we have gotten `doRegisterAssignment` to look exactly the same on several platforms we can deprecte the platform specific versions. Unfortunately on x86 we seem to be doing something completely different from all the other platforms, so in general tackling register allocation on x86 will need to be done separately. (commit: 6ed62c2)
Define __EXTABI__ for Port library compilation on AIX64
With enabling vector code generation for Project Panama, AIX systems need to define __EXTABI__ for full Vector ABI support. This is required for the signal handling code in the port library.
Introduce a new default scan ordering type 'none'. In gencon configuration, scan ordering will be set to hierarchical if it has not already been overidden by an option. In balanced, scan ordering will be set to dynamic BF if it has not already been overidden by an option.
Signed-off-by: Jonathan Oommen <jon.oommen@gmail.com> (commit: d24ef7d)
Unify TR_*OutOfLineCodeSection::assignRegisters across codegens
This API is almost identical on all our codegens. There are very small differences which have been `#ifdef` out. We're currently not 100% sure that these guarded paths are needed so we will leave them in for now and look to eventually unify the entire function as we get closer to true codegen unification. (commit: 1d23350)
This option is not set anywhere and there is no option flag present which can enable this option. Therefore the code guarded by it is dead. (commit: 20d6734)
``` warning: format specifies type 'unsigned long' but the argument has type 'chunk_t' (aka 'unsigned long long') [-Wformat] warning: format specifies type 'unsigned long' but the argument has type 'uint64_t' (aka 'unsigned long long') [-Wformat] ``` (commit: 4ba2ff8)
Enable warnings as errors in compiler component on OSX
The commits have addressed all warnings on an OSX build, so to prevent new warnings from being introduced we enable warnings as errors only on this platform. Future platforms will follow. (commit: 36f9278)
This commit adds AArch64 implementation for vector Negate operation. This implementation is for the following data types- - VectorInt8 - VectorInt16 - VectorFloat - VectorDouble
Signed-off-by: Md. Alvee Noor <mnoor@unb.ca> (commit: 56b4e29)
See #1662 for some further details. This will not be easy to solve becalse it is not straightforward to covert `TR::JitConfig` to a POD so we can use `offsetof`. This will hopefully be solved via the options framework overhaul, so fixing this warning properly is not worth the effort at the moment. Instead we locally disable this warning around the problematic area. (commit: 140c336)
Convert RegisterDependencyGroup to an extensible class on Z
We migrate all the code from the Z `TR_S390RegisterDependencyGroup` into an extensible `RegisterDependencyGroup` class. Then we create an extension of this class in the Z codegen to implement some codegen specific APIs. (commit: e38eed9)
Prepare replacing MemoryPoolBumpPointer with MemoryPoolAddressOrderedList
In order to reuse fragmented free memory in balanced GC, we need to use MemoryPoolAddressOrderedList instead of MemoryPoolBumpPointer to manage free memory. update related code for preparing the replacement.
- new virtual method addFreeMemoryPostProcess for the extra process after new free memory is added in freelist.
- support incremental card alignment for free memory list if it is needed.
This commit adds Tril test-cases for the vector bitwise operations in VectorTest. Following operations are covered- - Vector And - Vector Or - Vector Exclusive Or - Vector Negation - Vector Not
Signed-off-by: Md. Alvee Noor <mnoor@unb.ca> (commit: 9d8c4a2)
This pseudo-instruction is migrated to the common codegen. In addition the Z codegen has been updated to use the new naming scheme used across Power and AArch64 to use lowercase for these pseudo-instructions. (commit: 8a1541b)
Moved all instances of the _ialoadUnneeded attribute to the Z subfolder: - Moved "List<TR_Pair<TR::Node, int32_t> > _ialoadUnneeded;" from compiler/codegen/OMRCodeGenerator.hpp to z/codegen/OMRCodeGenerator.hpp
- Moved '_ialoadUnneeded(comp->trMemory()),'from compiler/codegen/OMRCodegenerator.cpp to z/codegen/OMRCodeGenerator.cpp
- Updated commit message to follow guidelines (commit: 6434715)
Handle both AddressOrdered and BumpPointer regionType in setMarkMapValid
In order to smooth transition for replacing MemoryPoolBumpPointer with MemoryPoolAddressOrderedList, update setMarkMapValid() to handle both AddressOrdered and BumpPointer regionType. Rollback the changes in ObjectHeapBufferedIterator.
Prevent LoopVersioner from attempting to privatize BCD type nodes
A crash would occur in decReferenceCount() after privatizing a pdLoadi because the Z evaluator was not expecting a pdLoad Opcode. This fix will disable privatizing all BCD type nodes.
Signed-off-by: Kevin Langman <langman@ca.ibm.com> (commit: 4c68263)
Add command line option to make inliner more aggressive
Added `enableAggressiveInlining` command line option which, under the covers, will enable other options or set threshold values such that the Inliner is more aggressive in inlining callees. This is done in the `jitPostProcess()` phase, so it may override other inliner specific options, also present on the command line. The new option is directed at experimentation and not geared towards production use. Moreover, the new option cannot be applied in a subset, but only globally, to all the methods.
Signed-off-by: Marius Pirvu <mpirvu@ca.ibm.com> (commit: b079422)
xlpCodeCacheTests fails on AIX because the default code cache large page size 64K is absent in the list of available code cache large page sizes output by `-verbose:sizes`.
Include 64K in the list of available code cache large page sizes for the test to parse as the expected page size when no `-Xlp` option is supplied and the default page size is used.
AArch64: Improve code generation for i2l-lshl sequence
This commit improves code generation for the IL sequence i2l-lshl that often appears in offset calculation for array access, by using `sbfm` instruction.
Limit Incremental Card Alignment under 64 bit system
Incremental Card Alignment is designed only for 64bit system, the code should never run on 32/31 bit system, use OMR_ENV_DATA64 precompile define to avoiding the side affection on 32/31 bit system.
Replace method in VPHandlers.cpp with equivalent method in VP
isArrayCompTypeValueType was originally defined as a static function in VPHandlers.cpp, but an equivalent virtual method that downstream projects can override is now defined in OMR::ValuePropagation. This change eliminates the function in VPHandlers.cpp and has that file use the method in ValuePropagation instead.
Signed-off-by: Henry Zongaro <zongaro@ca.ibm.com> (commit: 6714867)
Enable OSR for profiling compilations with JProfiling
In the profiling compilations that was using JIT Profiler, we had to disable the OSR in profiling compilations as we had to duplicate the method and size would become an issue and profiling compilations are very short-lived. With JProfiler we do not need to duplicate the method body and counters to collect profiling data are inserted in the code. This commits enables OSR and sets HCR mode to osr from traditional for profiling compilations when JProfiler is used which would allow us to collect accurate profile of the method to be used in subsequent compilations which by default has OSR enabled.
Signed-off-by: Rahil Shah <rahil@ca.ibm.com> (commit: 8caf6fe)
Reduce MemoryPoolAddressOrderedList casting in Usage
Using virtual method call backs in base class (MemoryPool) to reduce extend class casting. Add empty virtual methods in base class. rename recycleHeapChunk(env,...) to recycleHeapChunkForFreeList(env,...)
Refactor CopyForward to reuse fixupForwardedObject
Refactor CopyForward to be more consistent with Scavenger by reusing existing fixupForwardedObject method from ObjectModel. This allows for the simplification of calculateObjectDetailsForCopy for balanced as doesObjectNeedHash argument is no longer needed.
Signed-off-by: Jonathan Oommen <jon.oommen@gmail.com> (commit: c57a70c)
Inherit `OMR::InstOpCodeConnector` like on all other platforms. We had to adopt the use of the common pseudo-instructions in the x86 codegen to make this change. (commit: 3a5345e)
These classes will form the base of the transition from `X86OpCodes` and `X86OpCode` to the new cross platform `TR::InstOpCode`. For now much of the implementation is left empty as we try to slowly add functionality from X86Ops.hpp. (commit: 341a6b3)
We temporarily remove this API because of a naming conflict in `TR_X86OpCode` class. To avoid having to change the name of this function for purposes of this work we simply temporarily delete the API in `InstOpCode` base class until we have fully migrated `TR_X86OpCode`, at which point we can reinstate the original API. (commit: 36f0e3d)
This class can now be replaced by `OMR::X86::InstOpCode`. We simply copy over all the APIs from the former to the latter. There is no change in functionality following this commit.
Further we remove OpBinary.cpp and replace it with OMRInstOpCode.cpp and we get rid of the "_inlines" file for X86Ops. (commit: ca28401)
This is done in preparation to parse out the mnemonics from this file and write them into OMRInstOpCode.enum so that both tables generated via X86Ops.ins and the `Mnemonic` enum have the same lengths. (commit: 54ca99b)
Alias TR_X86OpCodes with OMR::InstOpCode::Mnemonic
We also introduce a new file temporarily to make the transition easier. This new file `#define` all the existing global scoped mnemonics to map to their `OMR::InstOpCode::Mnemonic` equivalents. We will eventually get rid of this file as we migrate instructions from the global scope. (commit: 23cc13b)
Fix construction of log file name with TR_EnablePIDExtension
* put pid and time in separate buffers so the file name can be created with a single call to sprintf(), which also eliminates the warning for strncat(buf, ".", 1) * fix bounds checking * fix operation order in getTimeInSeconds() * use sizeof instead of hard-coded constants
Signed-off-by: Keith W. Campbell <keithc@ca.ibm.com> (commit: d56ec61)
On Z, during trees lowering phase, it checks if the encountered aloadi node while walking trees appears under nopable inline guards. Conditions used to check if we are going to generate NOP for such parent Virtual Guard node (In which case, aloadi would not be evaluated) while marking aloadi unneeded was incorrect and in some cases, it would mark such aloadi unneeded even though it will generate the inline test which eventually can cause incorrect functional behaviour.
Signed-off-by: Rahil Shah <rahil@ca.ibm.com> (commit: f285fc7)
The i2l-lshl optimization skips assigning a register to the i2l node. When the i2l node has two or more references, the i2l node is evaluated later again, and it leads to unexpected results.
Allow inliner policy to decide whether to remove differing targets
Previously, for direct calls, targets that appear to be a different method from the one specified by the call node's symbol reference would always be removed.
This behaviour is still the default, but the new query allows downstream projects to customize the conditions under which those targets are removed, e.g. by exempting certain methods. (commit: d4f24cc)
isDebugExtension was extended in TR_DebugExt to denote that the object was not the traditional TR_Debug object. However, as TR_DebugExt no longer exists, there is no need for this virtual method.
Now that we have unified all evaluator names across all codegens we no longer have the need for per-codegen tables since the names are all the same! This means we can remove these table files. (commit: f753871)
Now that we have an evaluator for each and every IL opcode, we can no longer compare against `unImpOpEvaluator` or `badILOpEvaluator` to determine whether we have implemented an evaluator for that specific IL.
Instead we have to override this function and manually return false for the specific set of unimplemented opcodes. (commit: 6966969)
xlpCodeCacheTests fails on AIX because the default code cache large page size 64K is absent in the list of available code cache large page sizes output by `-verbose:sizes`.
Include 64K in the list of available code cache large page sizes for the test to parse as the expected page size when no `-Xlp` option is supplied and the default page size is used.
Add an option to buffer expensive JITServer compilations
This commit adds an option `TR_DisableJITServerBufferedExpensiveCompilations`, which can be enabled using `-Xjit:disableJITServerBufferedExpensiveCompilations` command line option. The option is used by OpenJ9 to enable buffering of expensive local compilations on JITServer client.
Signed-off-by: Dmitry Ten <Dmitry.Ten@ibm.com> (commit: 6d9e4dc)
Improve code generation for inlineArrayCmp on Power10
The current inlineArrayCmp has some drawbacks that affects it's performance. The key deficiencies are - expensive comparison opcode usage, single byte compare loop in residue loop, many branches - etc.
This commit uses wider load by utilizing power10 instructions and reduces the number of branches in the generated code. If merged, this patch enables more performant inlineArrayCmp on power10+ system. (commit: a95aba5)
POWER CG implements instanceOfEvaluator but not supportsInliningOf- IsInstance. This commit implement it allowing a call node to Class.isInstance() to be changed to an instanceof node.
Use addAtomic for incrementing counters of OMRMemCategory
Use addAtomic for incrementing counters of OMRMemCategory instead of spinning with compareAndSwapUDATA. Similarly, use subtractAtomic for decrementing counters.
Because this change simply makes omrmemcategories.c to use existing implementation of atomic add/sub in AtomicSupport.hpp and they are identical to replaced implementation in omrmemcategories.c, this commit should introduce no functional or performance change, but this will enable us to optimize implementation of atomic add/sub in a single file in the future.
Options that don't have the first character of the message as "F" in the option table entry will not be printed in "options in effect" when the option is specified. Thus, I added 'F' to the messages of OMR options, so that the options specified would also be printed to "options in effect" in verbose logs.
similar to the PR in openj9: #12790
Signed-off-by: Eman Elsabban <eman.elsaban1@gmail.com> (commit: 8b3021d)
Add a method to get maximum heap size for zOS compressedrefs mode
The heap size is determined via getUserExtendedPrivateAreaMemoryType(); Moved omriarv64.h from port/zos390 to include_core/unix/zos; Added two constants MAXIMUM_HEAP_SIZE_RECOMMENDED_FOR_COMPRESSEDREFS & MAXIMUM_HEAP_SIZE_RECOMMENDED_FOR_3BIT_SHIFT_COMPRESSEDREFS (moved from OpenJ9 project)
Signed-off-by: Jason Feng <fengj@ca.ibm.com> (commit: bc31ba4)
Includes getter/setter functions. This flag is used to identify symbols corresponding to dummy TR_ResolvedMethod that are not really resolved, but treated as such as the resolution mechanism involves resolving the arguments instead. An example of that is linkToStatic, which is a VM internal native call that is created for unresolved invokedynamic and invokehandle bytecodes.
Signed-off-by: Nazim Bhuiyan <nubhuiyan@ibm.com> (commit: 316fe47)
This loop expected that, for a given exitTreeTop, the next tree exitTreeTop->getNextTreeTop() would remain fixed even if the current block is removed from the trees. However, we shouldn't assume anything about the next/prev pointers of trees that have been deleted, since it's highly unclear which trees are supposed to be before/after a deleted tree. And indeed, the removal of the current block can sometimes update the next pointer. In particular, this happens when removing an empty block (in the following example, block_X) with two predecessors, one of which falls through (block_F) and one of which branches (block_B):
BBStart <block_F> ... (can fall through) BBEnd </block_F>
BBStart <block_X> BBEnd </block_X>
BBStart <block_N> ... BBEnd </block_N>
BBStart <block_N'> ... BBEnd </block_N'>
When removing block_X, redundant goto elimination (indirectly) calls TR::Block::insertBlockAsFallThrough() to make block_F fall through directly to block_N. This effectively clips out block_N's trees and then reinserts them immediately after block_F. As a result, before block_X is deleted, the trees are temporarily in the order shown below, where the tree immediately following the exit of block_X is the entry of block_N', not block_N:
BBStart <block_F> ... (can fall through) BBEnd </block_F>
BBStart <block_N> ... BBEnd </block_N>
BBStart <block_X> BBEnd </block_X>
BBStart <block_N'> ... BBEnd </block_N'>
The typical consequence of the mistaken assumption that the next tree remains stable would be just that processing skips block_N and goes directly to block_N'.
However, this assumption can cause a crash when redundant goto elimination is requested on block_X specifically. In that case, endTree is the entry of block_N, and because block_N is skipped, the loop doesn't terminate properly. Eventually it runs past the last block and tries to do treeTop->getNode() when treeTop is null.
The crash went unnoticed because the code in dead trees elimination that requests redundant goto elimination on particular blocks was instead mistakenly requesting a pass over the entire method.
To prevent these issues, redundant goto elimination now gets the entry of the next block before attempting to remove the current block. (commit: e495805)
Implementing ArrayCopyBNDCHKEvaluator() on AArch64
- This commit adds ArrayCopyBNDCHKEvaluator() and compareIntsAndBranchForArrayCopyBNDCHK() methods. - ArrayCopyBNDCHKEvaluator() is required for arrayCopyEvaluator() implementation.
Signed-off-by: Siri Sahithi Ponangi <sahithi.ponangi@unb.ca> (commit: b753913)
``` warning C4717: 'TR::assert_with_instruction_detail': recursive on all control paths, function will cause runtime stack overflow ``` (commit: b09bfc2)
``` warning C4291: 'void *TR_RelocationRecord::operator new(size_t,TR_RelocationRecord *)': no matching operator delete found; memory will not be freed if initialization throws an exception ``` (commit: aa4cbb1)
``` warning C4351: new behavior: elements of array 'TR_PersistentMemory::_totalPersistentAllocations' will be default initialized ``` (commit: 946ac17)
``` warning C4345: behavior change: an object of POD type constructed with an initializer of the form () will be default-initialized ``` (commit: 3de4141)
``` warning C4996: 'std::_Copy_impl': Function call with parameters that may be unsafe - this call relies on the caller to check that the passed values are correct. To disable this warning, use -D_SCL_SECURE_NO_WARNINGS. See documentation on how to use Visual C++ 'Checked Iterators' ``` (commit: 4bbec8c)
Fix corner case negation of INT_MIN in constrainIabs
Similarly with `LONG_MIN` in `constrainLabs`. This functional issue was introduced while fixing warnings, and this fix has been separated into a commit of it's own to explicitly islote it.
For full details see the excellent explanation in the following review: https://github.com/eclipse/omr/pull/6030#discussion_r641981734 (commit: fd52af9)
Migrate warnings as error queries outside of create_omr_compiler_library
Downstream projects may want to override warnings as errors for compiler libraries. Specifically, when compiling NASM sources on Windows Cygwin we want to avoid adding the warnings as error options via the `target_compile_options` API because NASM on Cygwin does not accept the `/WX` flag which Visual Studio wants to apply. In such cases the assembly sources are separated into their own compiler library and warnings as errors are disabled before creating the library such that the aforementioned flag is not applied. (commit: 31bfe55)
The flag is not compatible with all sources within a target. When we mix assembly and C/C++ sources the warnings as error flag will be applied to assembly sources as well, and the assember used may not support this flag. Because the flag is C/C++ compiler specific at the moment, we only add the flag for those targets. (commit: 9e907fb)
In a previous commit we erroneously changed the constraing for some VP handlers to address warnings in the area. We are isolating the this fix because the previous chnages were functionally incorrect. We address the warning instead with a template overload for `createIntRangeConstraint`.
See the following review comment for further discussion: https://github.com/eclipse/omr/pull/6030#discussion_r641864997 (commit: b2e3cb5)
Ignore unreachable paths in TR::GlobalValuePropagation::mergeDefinedOnAllPaths
When we are merging "defined on all" paths information from block predecessors, we need to ignore the ones that have been proven unreachable and therefore don't have that information calculated. Since it's an intersection of the info from all predecessors, resulting info becomes too conservative and does not allow propagation that could've happened otherwise. (commit: df01f62)
Inherit `OMR::InstOpCodeConnector` like on all other platforms. We had to adopt the use of the common pseudo-instructions in the ARM codegen to make this change. (commit: bf65603)
These classes will form the base of the transition from `ARMOpCodes` and `ARMOpCode` to the new cross platform `TR::InstOpCode`. For now much of the implementation is left empty as we try to slowly add functionality from ARMOps.hpp. (commit: faaca57)
This class can now be replaced by `OMR::ARM::InstOpCode`. We simply copy over all the APIs from the former to the latter. There is no change in functionality following this commit.
Further we remove OpBinary.cpp and OpProperties.cpp and replace it with OMRInstOpCode.cpp. (commit: 088d5c1)
Alias TR_ARMOpCodes with OMR::InstOpCode::Mnemonic
We also introduce a new file temporarily to make the transition easier. This new file `#define` all the existing global scoped mnemonics to map to their `OMR::InstOpCode::Mnemonic` equivalents. We will eventually get rid of this file as we migrate instructions from the global scope. (commit: b50d71b)
LE added a CEEPCB_3164 bit on CEEPCBFLAG6 flag within Process Control Block (PCB) to indicate support for CEL4RO31/64. Update omr_cel4ro31_isSupported() and sl_testOpen31bitDLLviaCEL4RO31 to use this new bit.
Add extern "C" to sltestlib31's function to ensure name is not C++ mangled.
Refactor opcode parametarizations in preparation for migration
To migrathe some of the parametarized opcodes we eliminate the template definitions if favour of macros which we will move into the `InstOpcode` class in the following commit. This change should not have any effect on semantics of the code and it is purely stylistic.
The previous `#define` definitions of these mnemonics would not work because of the C preprocessor, so we could not properly qualify the mnemonics with `TR::InstOpCode::` in code because the function definitions of the parameterized mnemonics do not exist in that class. We have to have actual functions there for the name resolution to work, so we had to move away from the `#define` to an inline function. (commit: 0b28027)
The evaluators removed in this commit are OpenJ9 specific and were previously being declared in OpenJ9. This commit removes the definitions down to OpenJ9 as well. The evaluators that cannot be moved down are now defined in the OMRTreeEvaluator class itself.
RISC-V: fix NULLing of register dependencies in `RVSystemLinkage::buildArgs()`
The original code was copied from AArch64 backend but was not properly updated to RISC-V. This commit fixes that by properly NULLing dependencies for all unused argument registers and for all non-preserved registers (both, GPRs and FPRs) (commit: c508805)
RISC-V: fix passing FP arguments when using system linkage
As specified in RISC-V PSABI spec [1], if there's more than 8 FP arguments, they're passed the same way as integer arguments, that is, in a0 a7 GPRs if available.
This commit fixes RVSystemLinkage to comply with the RISC-V PSABI regarding FP arguments.
Normally, direct calls to other methods are not supported because of missing support for trampolines (at OMR level). However, this is a severe limitation because most JITs call runtime services (helpers).
To workaround this limitation, if - at isel time - a direct call calls a resolved method whose address is already known, load that address and use an indirect jump (`jarl`) to make the call. While this may be wastefull if caller and callee are close enough (to fit in `jal` immediate), it is safe(r).
This allows for some demos to be made :-)
Signed-off-by: Jan Vrany <jan.vrany@fit.cvut.cz> (commit: fc21c26)
This commit changes code sequence used to load 64bit constants. It first loads high 32bit value into target registers' low 32bits and then use a sequence of slli + addi instructions to load lower 32 bits.
The advantage of this approach is that this sequence uses only one (target) registers whereas the previous one required additional scratch register. (commit: 4162462)
New static method initializeFreeMemoryProfileMaxSizeClasses
global variables largeObjectAllocationProfilingVeryLargeObjectThreshold, largeObjectAllocationProfilingVeryLargeObjectSizeClass and freeMemoryProfileMaxSizeClasses are initialized during MM_LargeObjectAllocateStats construction, new static method initializeFreeMemoryProfileMaxSizeClasses is created for avoiding the case, which the global variables has been used before initialized.
RISC-V: use either `auto` or explicitly-sized integer types
In order to follow compiler convention, this commit changes uses of `int` to either `auto` (where correct type can be inferred) or to explicitly-sized integer type. (commit: 8a7f2c0)
Enhance warnings as error flag to be language specific
We start off by replacing `OMR_WARNING_AS_ERROR_FLAG` with a definition per language, i.e. `OMR_<LANG>_WARNINGS_AS_ERROR_FLAG`. This lets us define custom flags per compiler toolchain. More specifically for cross compilations, ex. Cygwin, we can supply a custom flag for NASM file compilations correctly.
We will add subsequent flags for other languages in new commits. (commit: 1a5a46e)
This commit accommodates the implementation of the vector splat operation and respective opcodes for the data types- - VectorInt8 - VectorInt16 - VectorInt32 - VectorInt64 - VectorFloat - VectorDouble
Signed-off-by: Md. Alvee Noor <mnoor@unb.ca> (commit: 7bac47e)
Copy is now an inline function which calls copyForVariant, which performs the previous functionality for copy. copyForVariant is a template function that is told the variant of template that it is to use, either CS or STW. This dictates if copyForVariant should exhibit Concurrent Scavenger behaviour or Stop The World behaviour.
The enum CopyVariant is introduced to represent the different variants of copyForVariant. The two options are STW for IS_CONCURRENT_ENABLED == false, and CS for IS_CONCURRENT_ENABLED == true.
This commit accommodates changes needed to enable the vectorization for AArch64. The followint changes are made- - Called `setSupportsAutoSIMD()` in code generator. - Implemented `getSupportsOpCodeForAutoSIMD()`.
Signed-off-by: Md. Alvee Noor <mnoor@unb.ca> (commit: 27bd967)
Get real MacOS product version if in compability mode
If a program is built by a 10.15 or earlier SDK is run on Big Sur (version 11) or newer, it will run in a backwards compatibility context which will return product version as 10.16.
Thus, when we need to get Mac OS version when running on 11+, we need to use a workaround to get the real version string. In this case, we look at a symlink to the SystemVersion.plist file which bypasses the backwards compatibility context.
Signed-off-by: Mike Zhang <mike.h.zhang@ibm.com> (commit: 87fba64)
Simplify the logic for identifying minimum free size for sweeping
New variable _minFreeSize in MM_ParallelSweepChunk for keeping minimum free entry size for sweeping related memory pool. Avoid callback check and retrieve minimum free entry size from memorypool every time.
for Default case: minimum free entry size = memorypool.minimumFreeEntrySize
Version HCR guards even if they are loop transfer candidates
Once versioner has decided to use loop transfer for a guard, there is a guarantee that the guard, if it is still present in the hot loop, will exit the loop if taken. The guard can still be versioned (and removed entirely from the loop), in which case there will no longer be any possibility that it could be taken at all. So the taken edge of the guard can always be disregarded when searching the loop for operations that will remain in the hot loop after versioning. To make such searches more precise, loop transfer candidates are automatically included in _definitelyRemovableNodes and _optimisticallyRemovableNodes.
One such search of the loop determines whether or not HCR guards can be versioned, and also finds the HCR guards to be versioned (if possible). By ignoring removable nodes before checking whether they are HCR guards, this search has been failing to find HCR guards that are also loop transfer candidates (in the current loop). As a result, versioner has been failing to version those HCR guards even when the analysis concludes that it is safe to do so (and even though versioning is preferable to loop transfer).
With this commit, the HCR guard search will now check for an HCR guard before skipping the node (in case it is removable). This way, HCR guards will be identified and (if possible) versioned regardless of whether they are also loop transfer candidates.
Note that this change only affects HCR guards that are loop transfer candidates in the *current* loop. It's possible to have guards that appear within the current loop, and which are also loop transfer candidates, but which are only considered for loop transfer in an outer loop because they target blocks that are already outside of the current loop. Such guards are not included in _definitelyRemovableNodes and _optimisticallyRemovableNodes (and because their targets are outside of the current loop, including them wouldn't be helpful), so they would be found and versioned appropriately even before this commit. (commit: f54d201)
- Remove unnecessary end-of-line characters from TR_ASSERT() messages - Include header files in alphabetical order in OMRTreeEvaluator.cpp - Change the indentation of a line in OMRTreeEvaluator.cpp
This commit adds a new verbose option `TR_VerboseJITServerConns`. The option is used by OpenJ9 JITServer to log connection events between servers and clients.
Signed-off-by: Dmitry Ten <Dmitry.Ten@ibm.com> (commit: 90f845c)
Disable -Winvalid-offsetof in OMROptions and FEBase.cpp
See #1662 for some further details. This will not be easy to solve becalse it is not straightforward to covert `TR::JitConfig` to a POD so we can use `offsetof`. This will hopefully be solved via the options framework overhaul, so fixing this warning properly is not worth the effort at the moment. Instead we locally disable this warning around the problematic area. (commit: 5521972)
Add a new query to the TR::Compilation class to allow the compiler to query whether the runtime is some kind of mode (such as snapshot/restore) that would necessitate the compiler to generate code that is portable (ie generated on one system but can run on another). This is different from relocatable compiles because, for example with snapshot/restore, the code does not need to be relocated but it does need to be portable.
``` CCN5562 (W) No code is generated for the asm statement. The asm statement is ignored. ```
Assembly generation for C/C++ code is disabled by default unless the `-qasm` flag is set. The above warning was telling us the inline assembly code was not generating anything, which is definitely not what we want. Enabling `-qasm` further exposes bugs with the syntax for inline assembly on xlC which is why the changes in AtomicSupport.cpp were needed. (commit: 08d39e6)
If we are building with `-Wc,"convlit(ISO8859-1)"` xlC option then we want to avoid doing literal conversion of inline assembly statements because HLASM expects them to be in EBCDIC. Failing to do this results in compilation errors with HLASM as it tries to interpret ASCII characters of the inline assembly string.
This pragma is used elsewhere in the codebase so it is not something new. (commit: 6ab9b6d)
AArch64: Use add/sub shifted register instruction if possible
Use add (shifted register) instruction if add operation can be encoded into this instruction. Similarly, use sub (shifted register) instruction if sub operation can be encoded into this instruction.
Use getVirtualCallNodeForGuard() during ValuePropagation - replace existing code in constrainIfcmpeqne by a call to getVirtualCallNodeForGuard() (commit: b7314a8)
Dumping core using the signal requires an the task-get-allow entitlement, which prevents notarization. Thus we dump the core with a user space tool which does not require an entitlement.
Signed-off-by: Mike Zhang <mike.h.zhang@ibm.com> (commit: cfed42b)
Assert that labels are defined when applying label relocations
A label has a null address until its location is determined when it is encountered in the instruction stream during binary encoding. When the compiler generates a jump to a label whose address has not yet been determined, it also generates a label relocation to correct the jump target once the label's address is known after binary encoding. But if for some reason the label's address is never updated, the relocation would silently continue using zero and generate an incorrect jump.
Because such jumps can be hard to debug, this commit adds assertions that verify that the label has at least had some address set, otherwise failing fast. Note that zero cannot be the true address of a label, since we never generate instructions there. (commit: 43323d2)
When assigning virtual register to target real register while assigning registers for instruction, register exchange API which would take care of care of making exchange between the target real register and current assigned register bails out when faced with different sized register and falls back to the path where we would need spare register to satisfy the requirement. This would cause an issue where we run out of all the registers. In order to make exchange between target real register and current assigned register, if either one is 64-bit, we can simply use 8 byte spill slot to do that. This commit upgrades the register exchange API to do that.
AArch64: Use the immediate offset of ldr/str instructions
This commit changes populateMemoryReference() so that the compiler generates ldr/str instructions with constant immediate offsets in accessing arrays with constant indexes.
AArch64: Use negated constant value for add/sub node if it is more concise
This commit changes `genericBinaryEvaluator` to use negated value for add/sub operation if the negated value can be loaded into the register with the fewer instructions than original value. Also, `genericBinaryEvaluator` is changed to set a register to the constant node if the constant cannot be encoded into the instruction and the negated value is not used. This change prevents constant loading instructions from generated multiple times even if the constant node has multiple reference counts.
AArch64: Use mov bitmask immediate instruction if possible
This commit introduces ARM64Trg1ZeroImmInstruction class which is required to use `mov` (bitmask immediate) instruction. Also, loadConstant32 and loadConstant64 helper functions are changed to use that instruction if useful.
Modified Concurrent Scavenger conditions to use compiler optimization
Rason for condition change: - allowDuplicate bool only set to true if CS is enabled. Compiler can skip condition check for allowDuplicate if OMR_GC_CONCURRENT_SCAVENGER is enabled but CS is disabled. - nested if-forwarding-succeeded check succeeds in forwarding the object if CS is disabled. This is for the case OMR_GC_CONCURRENT_SCAVENGER but CS is disabled. Compiler can go directly into succeeded logic - if OMR_GC_CONCURRENT_SCAVENGER enabled but CS is disabled, there's no need to check allowDuplicateOrConcurrentDisabled, so we skip the block.
Cleanup softmx check and apply softmx to nursery expansion
At the moment, heap expansion logic allows nursery size to expand well beyond the softmx limit. Additionally, when the total heap size was above the softmx size, the softmx size was effectively ignored.
This PR fixes both behaviours mentioned above, and makes softmx limit apply to both nursery and tenure (ie, tenure + nursery < softmx).
Additionally, this PR does the following: - simplified initialization of `desiredContractionFactor` and `adjustedContractionFactor` in `MemorySubSpaceSemiSpace::checkSubSpaceMemoryPostCollectResize()`. Same idea for `desiredExpansionFactor` and `adjustedExpansionFactor` just above - refactoring `if(` to `if (`(with space) and `){` to `) {` (with space) in `MemorySubSpaceSemiSpace` and `MemorySubSpaceUnispace`
This PR is a continuation of #5728. Please see it for initial PR feedback on the initial set of changes that was done
Interpret field offsets in 'block' forms as expressions
Using the GNU compiler option '-gdwarf-2' means that field offset expressions cannot use 'DW_FORM_exprloc' because that form was not introduced until DWARF version 4; so the compiler instead uses 'DW_FORM_block*'.
Signed-off-by: Keith W. Campbell <keithc@ca.ibm.com> (commit: 9a92eee)
- TR::[v]printfLen(): return the length of the formatted string without writing the string into any buffer
- TR::[v]snprintfTrunc(): format into a fixed-size buffer, possibly truncating
- TR::[v]snprintfNoTrunc(): format into a fixed-size buffer with an assertion that the result is not truncated
- TR::StringBuf: a reusable growable Region-allocated buffer into which formatted text can be appended using [v]appendf()
These obviate the MSVC conditional snprintf #define at use sites, and they deal with the portability concerns of that #define correctly and in a centralized place. Additionally, StringBuf makes it simple to allocate enough memory to do formatting without truncation. (commit: cc0a9b8)
The TR_AOT option is legacy code; generally whether or not to AOT (in downstream projects) is controlled by a runtime option rather than an -Xjit or -Xaot option.
Reset _defMergedNodes in global VP on every basic block
_defMergedNodes is supposed to keep track of commonned nodes. So it's safe to reset it on every basic block. Also add an assert that there are no extended blocks during global VP. (commit: f79041a)
Disable transforming VFTLoad to loadaddr on Power for nonSVM AOT
Due to missing external relocation record, disable transforming VFT load of known object with fixed class to loadaddr on Power when compiling relocatable code without Symbol Validation Manager.
Signed-off-by: Rahil Shah <rahil@ca.ibm.com> (commit: ba6f751)
This change switches the order between updating INIT table (tuneToHeap call) and setting mark bits in place. This way, we guarantee that MM bits are set in place when we miss to update INIT table (or it's not updated in time) given overlap of KO and Tenure Expand.
The issue being resolved is similar to what has been outlined in a previous issue [https://github.com/eclipse-openj9/openj9/issues/8020#issuecomment-573843125], where Mark Map Bits get missed for init. Here, we have another similar timing hole, when one thread kicks off concurrent, Concurrent_OFF -> (Concurrent_INIT or INIT Complete), while the expanding thread is in middle of heapReconfigured.
With the original ordering of heapReconfigured, we first attempt to set the mark bits in place when Concurrent is ON. With Concurrent_OFF, we delay setting the bits until Concurrent_INIT. This requires update to the init table. Hence, for Concurrent_OFF we forgo setting bits and update the init table, we will expect `tuneToHeap` to do `determineInitRanges` to update init ranges table. The issue occurs when we don't set mark bits in place (given concurrent_OFF) but concurrent starts after the check and prior to updating the init ranges. Here,we either don't update init table (since init is in progress) or miss to update it in time.
With these changes of reordering heapReonfig, we will try to set mark bits in place only after check to update init range table. With this, any state transitions resulting in init table not being updated (or being updated to late) will be caught and accounted for as bits will be set after.
The warning complains about passing an argument of non-POD type through the following helper defined by Google Test:
``` static char (&Helper(...))[2]; // NOLINT ```
xlC for some reason does not like passing the non-POD type through this helper, although it appears to work correctly. This helper is used for statically determining whether types are implicitly covertable so all we care about is whether `sizeof` this helper is 1 or 2. Whether the type is passed through elipsis or as a parameter does not matter in this case and hence we can just silence the warning on xlC.
``` CCN8924 (W) Cannot pass an argument of non-POD class type "const ASTNodeArg" through ellipsis. ``` (commit: ee53df8)
If we are building using the native codepage (EBCDIC) we do not need to include this header. Otherwise `OMR_EBCDIC` will be defined and we will be linking with the atoe library so this header will be on the include path and we can safely include it.
``` WARNING CCN3068 /u/jenkins/fjeremic/omr/port/zos390/omrsyslog.c:108 Operation between types "char*" and "const char*" is not allowed. ``` (commit: 859c4bf)
``` WARNING CCN3280 /u/jenkins/fjeremic/omr/thread/common/thrprof.c:245 Function argument assignment between types "unsigned int*" and "int*" is not allowed. ``` (commit: 926386e)
Rename LightweightNonReentrantReaderWriterLock to avoid HLASM errors
If the file name of the compilation unit is greater than 38 characters and the compilation unit uses inline assembly HLASM will fail with the following message:
``` INFORMATIONAL CCN1149: Inline assembly statements caused HLASM RC=4 with the following messages: WARNING CCN1150: Inline assembly statements caused HLASM RC=4. ```
The listing file says the assembler has succeeded so this error message is quite puzzling and seems to go away when the file name is shortened. It is likely the CSECT name is too long for HLASM to handle, although changing the CSECT name does not alleviate the problem.
The simplest solution seems to be to truncate the name a little bit which is what we've done here. (commit: cc1d056)
The z/OS xlC v2.2 compiler does not support the `-Werror` option that is available starting with the v2.3 compiler version. Instead it does support the `-qhalt=4` option which according to the documentation:
> Stops compilation process of a set of source code before producing any > object, executable, or assembler source ®les if the maximum severity > of compile-time messages equals or exceeds the severity specified for > this option.
Adding this option to the command line unfortunately does not achieve the desired effect of enabling warnings as errors since compilations appear to complete with rc=0. The reason for this is because `HALT` is a batch option, not a USS one. To get the desired effect we have to specify two environment variables which when active will treat warnings as errors and return a non-zero return code. The two options are:
for C and C++ respectively. Until we can upgrade to the v2.3 compiler we have to run with these environment variables active to get the desired effect. (commit: 13463c9)
Ensure correct instr construction with neg imm using AIX Assembler
The AIX assembler perform bitwise-and then negate the value when `dq` is a negative value with a `-` sign. The pranthases ensure negating the immediate before performing the and operation.
Allow known object information to be associated with arbitrary nodes. This will permit more accurate association of refined known object info, and allow it to be placed on arbitrary nodes.
A new 32-bit bit field is added to the Node class in the padding space between two fields. This incurs no TR::Node footprint increase on 64-bit architectures.
Fix GC scavengerScanOrdering Not Initialized Correctly
Fix GC scavengerScanOrdering not initialized correctly when gc policy is gencon and splitFreeListSplitAmount is specified to be a value other than zero.
Signed-off-by: Jonathan Oommen <jon.oommen@gmail.com> (commit: 266d49d)
Prevent compacting NULLCHKs into computed call trees
Compacting NULLCHK into computed calls will result in the NULLCHK notbeing performed before dispatching into the call target if there is no need for dereferencing of the reference child to dispatch.
Also includes some formatting fixes.
Signed-off-by: Nazim Bhuiyan <nubhuiyan@ibm.com> (commit: 2805e95)
Disable verifyGlobalIndices() by default because the allocation below of nodesByGlobalIndex exposes a leak in TR::Region / TR::SegmentProvider when there are many nodes. This check has been effectively doing nothing in most builds anyway because it just traverses the trees doing TR_ASSERT(). (commit: 8efff88)
Define nonNullableArrayNullStoreCheck non-helper symbol
Downstream projects might define classes that do not permit null references. This change defines a non-helper symbol that takes an array reference and a value reference as arguments. A downstream project can use this as a placeholder for a check of whether the component type of an array is a class that does not permit null references, and if so, whether the value that is being assigned to an element of the array is a null reference.
Signed-off-by: Henry Zongaro <zongaro@ca.ibm.com> (commit: 0175216)
AArch64: Add relocation record for second child of ifacmpeq/ifacmpne
A relocation record for class or method pointer in the second child of `ifacmpeq`/`ifampne` is missing if the constant value of it can be encoded into 12bit immediate field. This commit adds a relocation record for it by evaluating the node if it needs relocation.
Handle checking the readability of memory segments from vm_region_recurse_64 and writing the readable segments in one step since multiple calls to mach_vm_read on the same address seems to cause errors.
Signed-off-by: Mike Zhang <mike.h.zhang@ibm.com> (commit: d064ed6)
``` IEW2646W 5383 ESD RMODE(24) CONFLICTS WITH USER-SPECIFIED RMODE(ANY) FOR SECTION $PRIV00007C CLASS B_TEXT. FSUM3065 The LINKEDIT step ended with return code 4. ```
Taking command-line for the lnik step of an executable which prints this message, ex. omrporttest, we can add the `-Wl,MAP` option to print some debugging diagnostics. From here we can see the offending section:
SECTION CLASS ------- SOURCE -------- OFFSET OFFSET NAME TYPE LENGTH DDNAME SEQ MEMBER ... ... 21A0 $PRIV00007C * CSECT 10 /000001D 01 omrgetstf-64.s.o 0 21A0 getstfle LABEL ```
Looking at the contents of this file though we note that the actual CSECT starts halfway through the file. The fact that the section was named "$PRIV00007C" must mean that it is referring to the code before the CSECT. Hence the way to fix this problem is to define the CSECT right at the start of the file and to define RMODE as ANY since it defaults to RMODE 24 if it is not specified as per [1].
To stay consistent with other platforms, internal pointers will be disabled for PPC under OptServer. They were temporarily re-enabled due to a 20% regression on WAS and Liberty, but this occurred nearly seven years ago, so the chances of reproducing it now are low. As of the time of this commit, further investigation is being done into Liberty performance, so if this does expose a bug or regression, it will be caught, and a permanent fix can be implemented.
Avoid symbol reference sharing for dummy resolved methods
It's possible to encounter an unresolved call, then to resolve its CP entry in another thread, and finally encounter another call with the same CP index, which is now resolved. In this situation, sharing is usually prevented due to the !(resolvedMethod && symRef->isUnresolved()) condition. However, a dummy resolved method does not satisfy symRef->isUnresolved().
It's important not to share the symbol reference because the dummy resolved method used in the unresolved case may expect arguments of a different type or a different number of arguments from the real resolved method. (commit: a0db673)
Improve non-overridden guard optimization in globalVP
- handle direct calls, use call site index and bytecode index to find the right call - sometimes, at the time the guard is seen not all constraints are available for the call. Keep a side table that maps call node to the corresponding guard and try to eliminate the guard at the time the call is encountered - support multiple guards pointing to the same block - Check isTheVirtualCallNodeForAGuardedInlinedCall inside getVirtualCallTreeForGuard (commit: c31dd09)
AArch64: Consider snippets size when aborting compilation of huge methods
A workaround for huge methods was introduced in #5819 because the range of AArch64 conditional branch is +/- 1MB. This commit enhances the workaround by taking into account the size of snippets which was disregarded in #5819.
The constants MIN_PROFILED_CALL_FREQUENCY, TWO_CALL_PROFILED_FREQUENCY, and MAX_FAN_IN are not used in OMR, thus can be removed and redefined inside OpenJ9 compilation units that use it.
Signed-off-by: Dmitry Ten <Dmitry.Ten@ibm.com> (commit: 4d0ca6e)
Reintroduce 'Cleanup softmx check and apply softmx to nursery expansion'
This reverts commit 73d89ab5bc815be3c15eb31c3ca63337cfe68217. (The original PR is https://github.com/eclipse/omr/pull/6120 , which includes main descriptions of this change)
Also, this commit includes the following modifications:
- Added assert to make sure `adjustExpansionWithinSoftMax` is never operating on a Generic MemorySubSpace - Added flags for Tarok MemorySubSpace type
Fix MPAOL inconsist issue between freeBytes and freeList
inconsist issue caused by missing to update previous free entry during elimilating free entry due to card alignment, it would generate gap between previous and current free entry for recycleHeapChunk, then cause losing free entries. it could trigger the assertions of gc allocation for balanced gc case.
- updated previous free entry for the case - additional assertion check in recycleHeapChunk for verifying the case -- for debugging only.
AArch64: Call redoTrampolineReservationIfNecessary() event if label is not null
The commit changes `TR::ARM64ImmSymInstruction::generateBinaryEncoding` to call `redoTrampolineReservationIfNecessary()` even if label is not null, which happens when a direct call is done through a snippet.
`TR::S390VRRInstruction::generateBinaryEncoding()` has been updated to generate binary encoding for VRR-g, VRR-h and VRR-i. We can now delegate the encoding work to the parent class.
Adding Concurrent Mark timing stats in ConcurrentPhaseStatsBase, and additional flags for heap resizing
This change introduces several flags which are used to keep track of Concurrent marking work timing/cost. This change also introduces a handful of flags which can be used by the GC to improve heap resizing.
- `_concurrentMarkProcessStartTime` keeps track of the cpu time for a concurrent mark increment - Added additional flags for heap expansion and contraction. - Added `MM_HeapSizingData`. This struct keeps track of several key characteristics that can be used by a GC to resize the heap - changed `heapExpansionGCTimeThreshold` and `heapContractionGCTimeThreshold` to `MM_UserSpecifiedParameterUDATA`. Renamed the fields to `heapExpansionGCRatioThreshold` and `heapContractionGCRatioThreshold` respectively. Doing this allows each GC policy to configure their own defaults. - `dnssExpectedTimeRatioMaximum` and `dnssExpectedTimeRatioMinimum` has been changed in the same way as described for `heapExpansionGCTimeThreshold` - Introduction of `tarokTargetMaxPauseTime`. This will serve as the default pause time target for certain GC policies in which a target pause time will be used - `reportHeapResizeAttempt()` now takes a memoryType param. This allows GC policies which are resizing logical spaces to print a `heap-resize` line for those logical spaces - Introduced several tracepoints
Mostly unrelated to the main work in this PR, the following utility function has been added - `MM_Math::weightedAverage()` for doubles has been added
This PR does not contain any functional/behavioural changes to the GC
Finally, a follow up PR should be performed, which does the following: - `Trc_MM_MemorySubSpaceTarok_calculateExpandSize_Exit1` should be made obsolete - `Trc_MM_MemorySubSpaceTarok_calculateTargetContractSize_Entry` should be made obsolete - `dnssExpectedTimeRatioMaximum/Minimum` should be removed, while `dnssExpectedRatioMaximum/Minimum` should be kept - `heapExpansionGCTimeThreshold` and `heapContractionGCTimeThreshold` should be removed, and `heapExpansionGCRatioThreshold` and `heapContractionGCRatioThreshold` should be kept.
AArch64: Block registers in regdeps when assigning register for branch instructions
When assigning registers for `cbz/cbnz` instructions, a real register specified in the register dependencies can be unintentionally assigned to the source register. This commit changes `assignRegister` functions to block registers in the register dependencies before assigning the source register.
AArch64: Correctly handle ificmpeq/ificmpne nodes with many global registers
Correctly count the integer or address child nodes added to GlRegDeps node to avoid adding too many integer registers to the register dependencies. Add a src register of compare branch instructions to register dependencies so that a src register is assigned when other registers in the register dependencies are assigned.
Add an unresolved check for mapOpCode() P10 prefix-instr mapping
The current code for patching unresolved addresses does not handles P10's 8byte prefix instructions, this commit adds a check to mapOpCode to not map memory instruction to prefix-instr in P10 if the reference is to an unresolved symbol.
Restrict rotate instruction optimization for land nodes
This commit limits an optimization in the z codegen that tries to perform a shift + land operation using a single rotate instruction. The optimization skips the evaluation of the land node but evaluates its children. If the reference count of the land node > 1, then this can result in the code generator double decrementing the land node's children. This commit restricts this optimization to when the land node's ref count is equal to 1 so that the above described scenario is no longer possible.
Prevent a SOF in TR_InductionVariableAnalysis::getEntryValue()
In extreme cases the recursion in getEntryValue() can cause a SOF when compiling large methods. This change will prevent the SOF by introducing a depth parameter that will prevent the recursion from exceeding a maximum depth and therefore preventing a SOF condition.
Signed-off-by: Kevin Langman <langman@ca.ibm.com> (commit: aaa00fc)
This commit contains the follow up items from https://github.com/eclipse/omr/pull/5825
This includes: - making `Trc_MM_MemorySubSpaceTarok_calculateExpandSize_Exit1` and `Trc_MM_MemorySubSpaceTarok_calculateTargetContractSize_Entry` obsolete - `dnssExpectedTimeRatioMaximum` and `dnssExpectedTimeRatioMinimum` were removed, since they are no longer used. The newer version (dnssExpectedRatioMaximum/Minimum) are more flexible for different GC policies - `heapExpansionGCTimeThreshold` and `heapContractionGCTimeThreshold` were removed, since they are no longer used. The newer versions (`heapExpansionGCRatioThreshold` and `heapContractionGCRatioThreshold`) are more flexible for different GC policies
The existing jitAcmpHelper symbol can be used in downstream projects to represent object reference comparisons for equality. This change introduces a jitAcmpneHelper symbol to represent inequality comparisons rather than treating that as a second-class operation requiring negation to implement.
The change also renames jitAcmpHelper to jitAcmpeqHelper to clarify the purpose of the existing symbol. The old symbol name will be retained until it is no longer being used by downstream projects.
Signed-off-by: Henry Zongaro <zongaro@ca.ibm.com> (commit: 8cf034e)
This commit removes a suffix "d" from a double constant in C++ code. Suffix "f" is used for a float constant, but double constants do not take any suffix.
Split _options and _optionString in OptionSet into separate fields
When both AOT and JIT have logs specified and when one is processed first, the OptionSet._options for the other one has not been set up properly yet. Since setOptions has not been called for the other command option, getOptions returns a char pointer to _optionString that is implicitly casted as TR::Options * because OptionSet._options and OptionSet._optionString are union. To properly check if OptionSet._options is valid, it needs to be a separate field.
Fixes #6189
Signed-off-by: Annabelle Huo <Annabelle.Huo@ibm.com> (commit: 1aad2a6)
When reading memory segments, reset the return value kr so that the generated file is not deleted at the end if the last memory segment throws an error when being read, since mach_vm_read errors are not fatal.
Also make sure final return value is correct when there is an error.
Signed-off-by: Mike Zhang <mike.h.zhang@ibm.com> (commit: 5891fb4)
Consider guard or instanceof as escape point in Field Privatizer
A reference or store to a field that came from an inlined method might be guarded by a virtual guard. In any particular execution of the loop, such a field might not actually exist in the object that's being referenced. To avoid that situation, this change adds a simple check for the existence of a virtual guard within the loop and considers it to be an escape point in the containsEscapePoints method.
Similarly, instanceof tests might guard casts that will be used to access fields that wouldn't necessarily exist every time a loop is executed, so they are considered escape points as well.
Signed-off-by: Henry Zongaro <zongaro@ca.ibm.com> (commit: c51edad)
Remove SupportedForPRE from instanceof for Field Privatizer bug
instanceof operations can be hoisted out of loops by PRE. This could expose a bug in Field Privatizer in which a field access that is guarded might be privatized, resulting in the field being referenced and stored to unconditionally.
A temporary patch in Field Privatizer checks for uses of instanceof and null comparisons inside of loops to guard against the conditions that expose the bug. This change removes the SupportedForPRE property from the instanceof operator to prevent its being hoisted out a loop, avoiding one more situation that could expose the bug.
Once the bug is fixed properly in Field Privatizer, the SupportedForPRE property will be added back to instanceof.
Signed-off-by: Henry Zongaro <zongaro@ca.ibm.com> (commit: 94a7471)
Consider comparisons with null to avoid invalid field privatization
A comparison with a null reference might be used to guard a field access. The Field Privatizer does not take such conditional access into account in deciding whether to privatize a field, which can result in its unconditionally copying a field's value into a temporary variable by dereferencing null.
This change adds a very simple check for comparisons with null inside of a loop to help to guard against that situation. That will avoid the bug in most cases, but follow up changes that are more robust, general and accurate in avoiding the problem are planned.
Signed-off-by: Henry Zongaro <zongaro@ca.ibm.com> (commit: f9c32ab)
This change defines a non-helper symbol that can be used by downstream projects to test a pair of object references for an inequality comparison. It is intended to represent the negation of the existing <objectEqualityComparison> non-helper symbol.
Signed-off-by: Henry Zongaro <zongaro@ca.ibm.com> (commit: 995bc2a)
Handle aliasing of <objectInequalityComparisonSymbol> non-helper
Ensure the aliasing of <objectInequalityComparisonSymbol> in OMR::SymbolReference::getUseDefAliasesBV is handled in the same way as that for <objectEqualityComparisonSymbol>.
Signed-off-by: Henry Zongaro <zongaro@ca.ibm.com> (commit: 5ea759c)
These new JIT helpers are OpenJ9-specific but unfortunately they still need to be declared here.
The definition of TR_jitLookupDynamicPublicInterfaceMethod uses TR_newArray+2 instead of TR_jitLookupDynamicInterfaceMethod+1 because the latter causes s390m4check.pl to complain that there is a "line longer than 71 characters". It's difficult to add this comment inline next to the definition itself because Helpers.inc is included directly into all of the following file types: - C++ header - CPP-preprocessed assembly - m4-preprocessed assembly (commit: 31a422d)
This commit adds a new flag to the TR_OptimizationPlan data structure, called DisableEDO. The new flag will be used in an upstream project to control whether EDO (Exception Directed Optimization) is allowed for the current compilation.
Signed-off-by: Marius Pirvu <mpirvu@ca.ibm.com> (commit: c3dc8a4)
Use a more specific implied location for intersection in VP
Any value known to satisfy the VPClassType constraint is an object, so it must be a reflective object rather than a VM-internal representation. (commit: 85d2e95)
Avoid combining a regular type bound with a ClassObject location in VP
...as the intersection of the two constraints. If the type is not already known to be a ClassObject, then bundling it into a VPClass with a ClassObject location will incorrectly change the meaning of the type constraint. (commit: d4ba4d4)
The definition of the helper index TR_acmpHelper was made equal to the index TR_acmpeqHelper when TR_acmpneHelper was introduced. Now that references to TR_acmpHelper have been replaced with TR_acmpeqHelper in downstream projects, TR_acmpHelper can be removed.
Signed-off-by: Henry Zongaro <zongaro@ca.ibm.com> (commit: 4832e6f)
Ubuntu 16.04 is no longer supported. Use the 18.04 image for the x86-64 Linux build job, and disable the Linter until it can be made to work on current Ubuntu levels.
This will replace the implementation in Aliases.cpp. This API adds a new default argument `TR_OpaqueClassBlock *` to be used to check if the class is a value type or not.
Related to #6217
Signed-off-by: Annabelle Huo <Annabelle.Huo@ibm.com> (commit: cbf479d)
This commit implements patchable GCRs on aarch64. - Add a new helper `patchGCRHelper`. - Change `bstoreEvaluator` to call out to `patchGCRHelper`. - Change ARM64Trg1ImmSymInstruction to set immediate value for adr instruction for gcr-patch-point when generating instructions. - Change ARM64ZeroSrc1ImmInstruction to set static address of the gcr-patch-point symbol when generating instructions.
This change does not fix anything patricular but provides general cleaning of the code: - size should have type size_t not int32_t - memory for local variable in c-stack should be zeroed - remove unnecessary declaration of two variables with the same name
Signed-off-by: Dmitri Pivkine <Dmitri_Pivkine@ca.ibm.com> (commit: 2fc665e)
These changes are to generalize the ConcurrentGC Collector Class so that it can be used for both IncrementalUpdate and SATB style of Concurrent Marking. They are a pre-req to introducing SATB since the existing concurrent collector class is closely tied to Incremental Update approach. CM enabling functionality needs to be abstracted - card table/cleaning cards must be extracted to the derived IncrementalUpdate class. Since there hasn't been any other implementation prior to SATB, ConcurrentGC has acted as Incremental Update Concurrent Collector, much of the components (e.g state machine, Initing, Tuning/adaptive parameters, concurrent state/phases, verbose logging, etc) can be reused with this rework by extracting card/barrier specific code.
- ConcurrentGC class is now an Abstract Class _(removed newInstance function)_ - Card cleaning specific adaptive params/factors moved to Incremental Update - Common (Generalized) logic & vars given Protected access in Concurrent GC - New methods introduced to help common up shared logic - signalThreadsToActivateWriteBarrierInternal - updateTuningStatisticsInternal - resetConcurrentParameters - preCompleteConcurrentCycle - finalConcurrentPrecollect - getTraceTarget
- External WB Entry points now call new collector specific handlers - J9ConcurrentWriteBarrierBatchStoreHandler - J9ConcurrentWriteBarrierStoreHandler
**Tuning logic Implemented for SATB** - adjustTraceTarget - tuneToHeap
Fix LabelSymbol constructor to always require a CodeGenerator
* Remove constructor without `CodeGenerator` parameter * Replace `TR::comp()` reach inside each constructor with `CodeGenerator` query * Add missing `_directlyTargeted` field initializaton to all constructors
Add a new symbol flag to mark addresses within method bounds
A new flag StaticAddressWithinMethodBounds indicates that an address stored in a static symbol is inside method bounds, and can be accessed using RIP addressing during an out-of-process compilation.
Signed-off-by: Dmitry Ten <Dmitry.Ten@ibm.com> (commit: f797475)
Ensure that created log files' names start with the specified name
Prior to this change, if suffixLogs (i.e. TR_EnablePIDExtension) was enabled (which by default it is), log files for compilation threads other than compilation thread 0 would be opened using a name consisting of only the extra suffix, e.g. ".193310.88163.20211109.150243.193310".
The cause was as follows:
1. When openLogFile() got a valid idSuffix, it would concatenate _logFileName with it into a stack buffer (filename).
2. Then when suffixLogs was also enabled, it would use the same stack buffer (filename) as the destination for the concatenation of the name so far (e.g. foo.log.1) with the PID and time. This was a call to sprintf() in which the destination buffer was also passed as an argument corresponding to a %s format specifier. This could (depending on platform?) behave as though the string argument was just an empty string.
Note that the compilation thread ID was also lost, so any threads other than compilation thread 0 could pass the same filename to trfopen() if they started to trace within the same second, resulting in too few log files created. Since trfopen() is implemented via fopen(), we can expect that such a log opened multiple times will be truncated on each open (after the first).
This problem is fixed by replacing the single buffer (filename) on the stack with a pair of buffers (buf0, buf1). At any given time, one is considered the next destination (destBuf). After formatting into destBuf, the buffers are swapped to ensure that it is safe to format into destBuf again. With this pair of buffers in place, it is no longer necessary to declare another buffer (tmp) for getFormattedName(). (commit: 1a1e980)
Define TR_VerboseLog::CriticalSection as a vlog RAII lock guard
The verbose log can now be locked by declaring a local variable of type TR_VerboseLog::CriticalSection, and the lock will be automatically released at the end of the variable's scope. Locking this way is advantageous because it's not possible to forget to release the lock, and because the lock will always be released even if an exception is thrown.
In general, releasing a lock while unexpectedly unwinding for an exception could leave a program in a bad state. If the lock protects an invariant, it's possible to take the lock, temporarily violate the invariant, and then unwind before restoring it. However, in the case of the verbose log lock specifically, the only effect of unlocking on unwind will be to prematurely truncate the output being generated at the time. The output is just diagnostic, so it's best to simply release the lock.
There is an optional boolean parameter to specify whether or not to actually acquire the lock. This is useful for cases where we are incrementally generating output that needs to stay together, but we're doing so conditionally in the midst of other logic, e.g.
if (verbose) TR_VerboseLog::write(...);
...
if (verbose) TR_VerboseLog::write(...);
...
if (verbose) TR_VerboseLog::writeLine(...);
The boolean parameter allows for a sequence like this to use CriticalSection while still only locking when necessary:
TR_VerboseLog::CriticalSection vlogLock(verbose);
Without it, the use of CriticalSection would force the code either to lock each write individually, which is too fine a granularity and allows unwanted interleaving, or to take the lock even when not outputting to the verbose log. (commit: 38d242b)
A downstream project used overloading of ArrayStoreCHK to allow for checking of whether a null reference was being stored to an array whose component type is one that does not allow for null references. That overloading is no longer used in the downstream project, and the new nonNullableArrayNullStoreCheck non-helper symbol can be used to achieve the same effect. This change removes the potential overloading of ArrayStoreCHK.
Signed-off-by: Henry Zongaro <zongaro@ca.ibm.com> (commit: 232e20e)
For inlined callee methods that we skip creating HCR guards for, setting createdHCRGuard to false prevents creating an OSR transition point to be taken in cases such as a failed TR_ProfiledGuard.
Signed-off-by: Nazim Bhuiyan <nubhuiyan@ibm.com> (commit: 8a06709)
Do not rematerialize register for class pointer or method pointer
Do not rematerialize register for class pointer or method pointer if it's AOT compilation because it doesn't have node info in register rematerialization to create relocation record for the class pointer or the method pointer.
Signed-off-by: Annabelle Huo <Annabelle.Huo@ibm.com> (commit: c6a8248)
Implementation of SATB routines to enable SATB for `optavgpause` _(Limited to Xint+OOL Allocations)_
- Added obj alloc premarking to TLH Allocation interface - Introduced some ASSERTS in shared ConcurrentGC code to ensure we don't transition to certain concurrent sates and we don't call certain callbacks known to be unreachable by SATB
_Implemented the following methods for SATB collector:_ - `setupForConcurrent` - initial STW to mark roots and set allocation colour - `doConcurrentTrace` - trace routine, adapted from incremental approach simplified to remove CARDS and "promote" Background threads activity - `completeConcurrentTracing`- final STW to flush barrier packets and complete any remaining tracing before handing off to ParallelGlobalGC - `setThreadsScanned` - to shade threads "black" - `initialize`; currently there is much this method does, but will be needed to register call backs for GENCON and premaking TLH
AArch64: Improve MemoryReference class for array access
Improve MemoryReference in aarch64 codegen. - Enable capturing aladd node in `populateMemoryReference` function. - Enable use of scale and extend code. - Simplify `consolidateRegisters` and `addToOffset` function. - Introduce `normalize` function to ensure invalid combination of base register and index register and offset is not used.
Now that the Standard Overflow handler is used by SATB, the handler must notify the collector when an overflow occurs (e.g to increment overflow stats and set proper flags specific to concurrent collector). This mirrors the Incremental CGC overflow handler (in ConcurrentOverflowHandler).
Furthermore, logic from `clearConcurrentWorkStackOverflow` specific to Concurrent Incremental has been pulled out from base, this resulted in memory corruption discovered by testing with `-Xcheck:memory`
1)This commit deletes some inactive code in the PRE optimization. 2)There were some old mentions that were renamed to exception check removal. 3)Finally tracing was improved in the cost benefit part of PRE.
AArch64: Provide factory methods for MemoryReference objects
Introduce static factory methods for MemoryReference objects into aarch64 codegen. Replace all direct calls to MemoryReference constructors with corresponding factory methods. The separate PR will make MemoryReference constructors private.