Ad-hoc Initialization Restrictions in Java/JVM21 Jul 2022
If we search for the word “initializ” in the Java Language Specification (JSL8), we would find 730 occurrences. As a comparison, the word “object” appears 846 times. We can also do similar search in the Java Virtual Machine Specification. We summarize results below:
The statistics shows how important initialization is in the design of Java and the Java virtual machine. I’ll use this post to track and criticize the Java/JVM specification related to initialization. It will be updated from time to time.
An explicit constructor invocation statement in a constructor body may not refer to any instance variables or instance methods or inner classes declared in this class or any superclass, or use this or super in any expression; otherwise, a compile-time error occurs. (JLS §22.214.171.124)
This rule basically forbids the usage of
this as a value in a super constructor call. It
corresponds to the rules related to
uninitializedThis in JVM. The type
uninitializedThis is a
special type in JVM bytecode verification type system (JVMS §126.96.36.199):
Top | | +---------------------------------------+ | | | | oneWord twoWord | | +---------------------------+ | | | | +--------------+ | | | | | int float reference | | | long double | +---------------------------------------+ | | | | uninitialized Java Reference | type hierarchy | | +-----------------------+ | | | | | | null uninitializedThis uninitialized(Offset)
While verifying the bytecode of constructors, the JVM specification dictates that
takes the type
uninitializedThis (JVMS §4.10):
instanceMethodInitialThisType(Class, Method, uninitializedThis) :- methodName(Method, '<init>'), classClassName(Class, ClassName), classDefiningLoader(Class, CurrentLoader), s uperclassChain(ClassName, CurrentLoader, Chain), Chain \= .
uninitializedThis is not a subtype of any Java reference types, it prevents
this from being
used in a super constructor call. After the super constructor call, the object takes the type of the
class of the constructor being checked.
The restrictions propagate to all JVM languages. For example, the following Scala code does not type check:
class A(a: A): def this() = this(this) // error: the argument `this` is type checked in the outer scope of class A class B extends A(this) // error: same as above
However, the following code works fine:
class A: val b = new B(this) class B(a: A)
As a programmer, you might be wondering, why passing an uninitialized value of type
A to another
constructor is allowed? Won’t they cause similar safety problems, as the following code shows:
class A: val b = new B(this) val n = 10 class B(a: A): println(a.n) // error: access uninitialized field a.n
The inconsistency is a sign of the flaws in the design.
Meanwhile, the restriction also makes some programming patterns impossible, as the following code demonstrates:
abstract class Context(outer: Context) object RootContext extends Context(this) // error: cannot use `this`
Instead, programmers have to resort to
null to make the code type check:
abstract class Context(outer: Context) object RootContext extends Context(null)
We could use the following workaround, but it is more verbose:
abstract class Context: val outer: Context object RootContext extends Context: val outer = this
The restriction on the usage of
this in super constructor calls leads to inelegant design in the
Scala language, as the following code demonstrates:
class A(x: Int): println(x) class B(val y: Int) extends A(y): // what is the `y` in the super constructor call? foo() def foo() = println(y) class C(override val y: Int) extends B(10) @main def Test = new C(20)
this is not accessible in the super constructor call, then what is the meaning of
y in the
super constructor call
A(y)? Running the program, we will get
10 20 as output. It means the
in the class body and the
y in the super constructor call do not have the same meaning!
The behavior is surprising, however, there is nothing that the language designer can do to fix the inconsistency due to the JVM restriction.
We could explain the surprising behavior away by saying that
y in the super constructor call is
parameter access, while
y in the method
foo is property access. But that leaks too many
implementation details to programmers. A new learner might ask:
Why we cannot access the parameter in the method
We may say that
Without the modifier
valit’s essentially the same as parameter access in
Then the learner would ask:
valdoes not change the meaning of
yin the super constructor call?
And the answer has to go back to the JVM restriction.
I hope from the examples I have convinced you that
- The restrictions related to safe initialization in Java/JVM specifications are ad-hoc,
- They complicate the specification and implementation of programming languages,
- They prevent useful programming patterns.
Safe initialization is a complicated topic. I studied the problem during my PhD and implemented an initialization checker in Scala 3. One thing I learned in the process is that while it is a noble goal to have more safety built into a low-level intermediate representation (e.g., Java bytecode), it is better to leave safe initialization to high-level language design.