Ad-hoc Initialization Restrictions in Java/JVM
21 Jul 2022If we search for the word “initializ” in the Java Language Specification (JSL8), we would find 730 occurrences. As a comparison, the word “object” appears 846 times. We can also do similar search in the Java Virtual Machine Specification. We summarize results below:
JSL8 | JVM8 | |
“object” | 846 | 373 |
“initializ” | 730 | 305 |
“instance” | 819 | 323 |
The statistics shows how important initialization is in the design of Java and the Java virtual machine. I’ll use this post to track and criticize the Java/JVM specification related to initialization. It will be updated from time to time.
JLS §8.8.7.1
An explicit constructor invocation statement in a constructor body may not refer to any instance variables or instance methods or inner classes declared in this class or any superclass, or use this or super in any expression; otherwise, a compile-time error occurs. (JLS §8.8.7.1)
This rule basically forbids the usage of this
as a value in a super constructor call. It
corresponds to the rules related to uninitializedThis
in JVM. The type uninitializedThis
is a
special type in JVM bytecode verification type system (JVMS §4.10.1.2):
Top
|
|
+---------------------------------------+
| |
| |
oneWord twoWord
| |
+---------------------------+ |
| | | +--------------+
| | | | |
int float reference | |
| long double
|
+---------------------------------------+
| |
| |
uninitialized Java Reference
| type hierarchy
| |
+-----------------------+ |
| | |
| | null
uninitializedThis uninitialized(Offset)
While verifying the bytecode of constructors, the JVM specification dictates that this
initially
takes the type uninitializedThis
(JVMS §4.10):
instanceMethodInitialThisType(Class, Method, uninitializedThis) :-
methodName(Method, '<init>'),
classClassName(Class, ClassName), classDefiningLoader(Class, CurrentLoader), s
uperclassChain(ClassName, CurrentLoader, Chain), Chain \= [].
As uninitializedThis
is not a subtype of any Java reference types, it prevents this
from being
used in a super constructor call. After the super constructor call, the object takes the type of the
class of the constructor being checked.
The restrictions propagate to all JVM languages. For example, the following Scala code does not type check:
class A(a: A):
def this() = this(this) // error: the argument `this` is type checked in the outer scope of class A
class B extends A(this) // error: same as above
However, the following code works fine:
class A:
val b = new B(this)
class B(a: A)
As a programmer, you might be wondering, why passing an uninitialized value of type A
to another
constructor is allowed? Won’t they cause similar safety problems, as the following code shows:
class A:
val b = new B(this)
val n = 10
class B(a: A):
println(a.n) // error: access uninitialized field a.n
The inconsistency is a sign of the flaws in the design.
Meanwhile, the restriction also makes some programming patterns impossible, as the following code demonstrates:
abstract class Context(outer: Context)
object RootContext extends Context(this) // error: cannot use `this`
Instead, programmers have to resort to null
to make the code type check:
abstract class Context(outer: Context)
object RootContext extends Context(null)
We could use the following workaround, but it is more verbose:
abstract class Context:
val outer: Context
object RootContext extends Context:
val outer = this
The restriction on the usage of this
in super constructor calls leads to inelegant design in the
Scala language, as the following code demonstrates:
class A(x: Int):
println(x)
class B(val y: Int) extends A(y): // what is the `y` in the super constructor call?
foo()
def foo() = println(y)
class C(override val y: Int) extends B(10)
@main
def Test = new C(20)
If this
is not accessible in the super constructor call, then what is the meaning of y
in the
super constructor call A(y)
? Running the program, we will get 10 20
as output. It means the y
in the class body and the y
in the super constructor call do not have the same meaning!
The behavior is surprising, however, there is nothing that the language designer can do to fix the inconsistency due to the JVM restriction.
We could explain the surprising behavior away by saying that y
in the super constructor call is
parameter access, while y
in the method foo
is property access. But that leaks too many
implementation details to programmers. A new learner might ask:
Why we cannot access the parameter in the method
foo
?
We may say that
Without the modifier
val
it’s essentially the same as parameter access infoo
.
Then the learner would ask:
Why adding
val
does not change the meaning ofy
in the super constructor call?
And the answer has to go back to the JVM restriction.
Conclusion
I hope from the examples I have convinced you that
- The restrictions related to safe initialization in Java/JVM specifications are ad-hoc,
- They complicate the specification and implementation of programming languages,
- They prevent useful programming patterns.
Safe initialization is a complicated topic. I studied the problem during my PhD and implemented an initialization checker in Scala 3. One thing I learned in the process is that while it is a noble goal to have more safety built into a low-level intermediate representation (e.g., Java bytecode), it is better to leave safe initialization to high-level language design.