Fengyun Liu

Ad-hoc Initialization Restrictions in Java/JVM

If we search for the word “initializ” in the Java Language Specification (JSL8), we would find 730 occurrences. As a comparison, the word “object” appears 846 times. We can also do similar search in the Java Virtual Machine Specification. We summarize results below:

  JSL8 JVM8
“object” 846 373
“initializ” 730 305
“instance” 819 323

The statistics shows how important initialization is in the design of Java and the Java virtual machine. I’ll use this post to track and criticize the Java/JVM specification related to initialization. It will be updated from time to time.

JLS §8.8.7.1

An explicit constructor invocation statement in a constructor body may not refer to any instance variables or instance methods or inner classes declared in this class or any superclass, or use this or super in any expression; otherwise, a compile-time error occurs. (JLS §8.8.7.1)

This rule basically forbids the usage of this as a value in a super constructor call. It corresponds to the rules related to uninitializedThis in JVM. The type uninitializedThis is a special type in JVM bytecode verification type system (JVMS §4.10.1.2):

                                     Top
                                      |
                                      |
                   +---------------------------------------+
                   |                                       |
                   |                                       |
                oneWord                                 twoWord
                   |                                       |
     +---------------------------+                         |
     |             |             |                  +--------------+
     |             |             |                  |              |
    int          float      reference               |              |
                                 |                 long          double
                                 |
                   +---------------------------------------+
                   |                                       |
                   |                                       |
             uninitialized                           Java Reference
                   |                                 type hierarchy
                   |                                       |
        +-----------------------+                          |
        |                       |                          |
        |                       |                         null
 uninitializedThis    uninitialized(Offset)

While verifying the bytecode of constructors, the JVM specification dictates that this initially takes the type uninitializedThis (JVMS §4.10):

instanceMethodInitialThisType(Class, Method, uninitializedThis) :-
  methodName(Method, '<init>'),
  classClassName(Class, ClassName), classDefiningLoader(Class, CurrentLoader), s
  uperclassChain(ClassName, CurrentLoader, Chain), Chain \= [].

As uninitializedThis is not a subtype of any Java reference types, it prevents this from being used in a super constructor call. After the super constructor call, the object takes the type of the class of the constructor being checked.

The restrictions propagate to all JVM languages. For example, the following Scala code does not type check:

class A(a: A):
  def this() = this(this)    // error: the argument `this` is type checked in the outer scope of class A

class B extends A(this)      // error: same as above

However, the following code works fine:

class A:
  val b = new B(this)

class B(a: A)

As a programmer, you might be wondering, why passing an uninitialized value of type A to another constructor is allowed? Won’t they cause similar safety problems, as the following code shows:

class A:
  val b = new B(this)
  val n = 10

class B(a: A):
  println(a.n) // error: access uninitialized field a.n

The inconsistency is a sign of the flaws in the design.

Meanwhile, the restriction also makes some programming patterns impossible, as the following code demonstrates:

abstract class Context(outer: Context)

object RootContext extends Context(this) // error: cannot use `this`

Instead, programmers have to resort to null to make the code type check:

abstract class Context(outer: Context)

object RootContext extends Context(null)

We could use the following workaround, but it is more verbose:

abstract class Context:
  val outer: Context

object RootContext extends Context:
  val outer = this

The restriction on the usage of this in super constructor calls leads to inelegant design in the Scala language, as the following code demonstrates:

class A(x: Int):
  println(x)

class B(val y: Int) extends A(y):  // what is the `y` in the super constructor call?
  foo()
  def foo() = println(y)

class C(override val y: Int) extends B(10)

@main
def Test = new C(20)

If this is not accessible in the super constructor call, then what is the meaning of y in the super constructor call A(y)? Running the program, we will get 10 20 as output. It means the y in the class body and the y in the super constructor call do not have the same meaning!

The behavior is surprising, however, there is nothing that the language designer can do to fix the inconsistency due to the JVM restriction.

We could explain the surprising behavior away by saying that y in the super constructor call is parameter access, while y in the method foo is property access. But that leaks too many implementation details to programmers. A new learner might ask:

Why we cannot access the parameter in the method foo?

We may say that

Without the modifier val it’s essentially the same as parameter access in foo.

Then the learner would ask:

Why adding val does not change the meaning of y in the super constructor call?

And the answer has to go back to the JVM restriction.

Conclusion

I hope from the examples I have convinced you that

  • The restrictions related to safe initialization in Java/JVM specifications are ad-hoc,
  • They complicate the specification and implementation of programming languages,
  • They prevent useful programming patterns.

Safe initialization is a complicated topic. I studied the problem during my PhD and implemented an initialization checker in Scala 3. One thing I learned in the process is that while it is a noble goal to have more safety built into a low-level intermediate representation (e.g., Java bytecode), it is better to leave safe initialization to high-level language design.