When Encapsulation is a Sin

The popularity of object-oriented programming has made it a maxim that encapsulation is a good thing, which leads to better modularity and extensibility. Is it necessarily true that encapsulation in all scenarios is a good thing?

It’s obvious that it’s always good to encapsulate the implementation of a list, map or set, which abstracts aways inessential details from the task at hand. But is it always good to hide the data structure of a TCP package or database schema in an information system?

There are several arguments for encapsulation:

  1. Shield inessential details: least knowledge principle
  2. Defend against external changes: encapsulate change
  3. Changeability: separation of interface and implementation

When it comes to the core data definitions, none of the arguments hold. By core data definitions, I mean the database schema in an information system, package structure in a network protocol implementation, abstract syntax tree in compiler construction, etc.

First, core data definitions are the foundation of a system, they should be the shared knowledge of all team members. For programmers, this kind of knowledge is not a cognitive burden, but an enabling insight to better understand, develop and maintain the system. Core data definitions are essential details. If there are any accidental details in the core definitions, then the definition itself is problematic and should be improved.

Second, as core data definitions, there’s no external change to encapsulate – that’s the job of an abstract interface. Thus the second argument doesn’t apply.

Third, does encapsulation of core data definitions make the system more extensible or changeable? I can hardly see that obscuring a package structure by helper methods makes the protocol extensible, or encapsulating fields of an AST definition makes the intermediate language easy to extend.

In contrast, encapsulation of core data structures only harms understanding and communication by making the core data definitions indirect, obscure and unnecessarily complex. In this regard, encapsulation actually increases the cognitive burden of programmers. It’s an obstacle to understanding, communication, reasoning and productivity.

In my opinion, core data structures should be made as explicit, direct, simple and clear as possible.

More radically, I think in projects, the definition of core data definitions should go into a single file! When I was playing with Linux kernel, I like much that all core struct definitions are centralized in just a few .h header files. When I was working on Ruby on Rails projects, I like the file db/schema.rb a lot, as it holds the database schema in a single file. In Scala projects, I prefer to have all case classes defined together in a single file or package.

It’s OK to have helper methods to deal with core data definitions. However, I think they should not be in the same place as the core data definitions. Putting them together only obscure the core data definition, thus makes reading and understanding more demanding! Helper methods are cognitively inessential, thus should not be mixed with the essential stuff.

Philosphically, the core definitions are ontological commitment of the system. Occam’s razor should be used to make it as minial as possible. Also, they should be made as simple, clear, explicit and without noise as possible.

From the language point of view, abstraction is acceptable, encapsulation is bad. Algebraic data types, tuples, records, vector, lists, maps, structs and interfaces are best means to define core data types. Classes are the worst choice for making ontological commitments. The reason is that it’s easy to add noise to obscure the ontological commitments, as well as make implicit commitments that compromise communication and reasoning about the system.

The expression problem has nothing to do with core definitions. Generally, we should not pretend that we can change the fundamental assumptions without affecting existing code, even if there’s some complex trick to achieve such a design in a specific setting. However, I’m not against to use such tricks at higher levels.