Four harmful Java idioms, and how to fix them

阅读更多

Four harmful Java idioms, and how to fix them

Rewrite convention and optimize your code for maintainability

Idioms in any programming language help you get your bearings and provide structure when you start writing code. But, as John O'Hanley points out, some Java programming idioms are both widespread and harmful, having a negative influence on code maintainability. Read on if you're ready to break with convention, rewrite four harmful Java idioms, and optimize your code for maintainability.

Coding idioms exist to make life easier for programmers -- in particular for maintenance programmers, who are usually reading code produced by someone else. Coding conventions are fundamentally about showing compassion for these maintenance workers; the conventions you choose should allow them to read, understand, and use your code as quickly and as painlessly as possible. The more you optimize your coding style for the mental experience of those who will be maintaining your code, the more compassionate your code becomes, and the more quickly it will be understood.

Similarly, higher-level idioms (immutability and package structure, for example) aim to both improve the design and make code easier to understand. In fact, some might say that "improving the design" and "making the code easier to understand" are, in the end, one and the same thing.

In this article, you'll see that some popular idioms should be changed in favor of alternatives that are demonstrably more compassionate towards the reader. One might argue that any idioms that are already widespread should never be abandoned, simply because they are expected by most readers. However, reader expectations are only one part of the equation, and they shouldn't necessarily override all other considerations. The fundamental question should not be "Does the reader expect this?" but rather "How fast will the reader understand it?" I'll approach four different problematic -- but common! -- idioms in turn.

Local variables, method arguments, and fields: Which is which?

When trying to understand code, a reader often needs to answer a simple question: Where did this data come from? In particular, when trying to understand a method, a reader needs to know which items are local variables, which are fields, and which are method arguments. To exercise compassion for your reader, and to absolutely minimize the effort required to understand your code, it's likely best to use a simple naming convention to distinguish between these three cases.

Many organizations qualify field variables with this to distinguish them from other kinds of variables. That's good, but it doesn't go far enough: there should be a convention to distinguish method arguments as well. The logic that justifies naming conventions for fields also applies to method arguments, and for exactly the same reasons.

Listing 1 offers an example of an equals() method that does not distinguish between the three kinds of variables.

Listing 1. A method that doesn't distinguish between variable types

public boolean equals (Object arg) {
  if (! (arg instanceof Range)) return false;
  Range other = (Range) arg;
  return start.equals(other.start) && end.equals(other.end);
}

 

The French writer Marcel Proust was famous for analyzing human experience to microscopic levels. How would a modern-day Proust describe the experience of reading this equals() method? When you try to understand it, you experience a small amount of discomfort when you encounter start, because it's suddenly referenced out of the blue. Your brain hits a speed bump as it encounters something it hasn't seen before -- Uh oh, what's this? After a moment, through a tedious process of elimination (the length of which is proportional to the size of the method), you will eventually figure out that start is actually a field. The same sort of trivial and tedious reasoning is also needed to distinguish method arguments from other types of data.

   Listing 1 isn't very long. If the method were larger, then the effort necessary to understand it would be increased as well.

Why make the reader experience this jolt, this moment of confusion, however slight? Why not spare your readers the unpleasant experience of not understanding something immediately? Why force them to figure it out? If a simple naming convention can allow the reader to distinguish the three kinds of variables almost unconsciously, then shouldn't it be used? As Steve McConnell says in Code Complete, "It's OK to figure out murder mysteries, but you shouldn't need to figure out code. You should be able to read it."

One might argue that this is why fields should be declared at the start of a class. But that isn't a good solution, because it forces the reader to either memorize field names or perform a lookup operation in a distant section of the class. Either option makes the reader do more work. And the whole point is to exercise maximal compassion for the readers, and let them read and understand your code with minimal effort.

Listing 2 contains the example code from Listing 1, rewritten. It uses a naming convention in which:

  • Method arguments are prefixed with a
  • Fields are prefixed with f
  • Local variables have no prefix at all

Listing 2. Variable types are now clearly distinguished

public boolean equals (Object aOther) {
  if (! (aOther instanceof Range)) return false;
  Range other = (Range) aOther;
  return fStart.equals(other.fStart) && fEnd.equals(other.fEnd);
}

 

You might object to the style on display in Listing 2, protesting that "Hungarian notation is obsolete." But that objection is erroneous: Hungarian notation specifies type information. The above naming convention says nothing about type. Rather, it distinguishes between fields, arguments, and local variables. These are two completely different concepts.

The idea of using naming conventions in this way may seem trivial, but it isn't: when such conventions are used throughout your code, the effort needed to understand it is significantly reduced, because you can perform more operations without thinking about them.

"By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and in effect increases the mental power of the race. Before the introduction of the Arabic notation, multiplication was difficult, and the division even of integers called into play the highest mathematical faculties. Probably nothing in the modern world would have more astonished a Greek mathematician than to learn that ... a large proportion of the population of Western Europe could perform the operation of division for the largest numbers. This fact would have seemed to him a sheer impossibility ... Our modern power of easy reckoning with decimal fractions is the almost miraculous result of the gradual discovery of a perfect notation."
-- Alfred North Whitehead, An Introduction To Mathematics

Package-by-layer: Preventing the use of package-private scope

A common way of dividing up an application is package-by-layer:

  • com.blah.action
  • com.blah.dao
  • com.blah.model
  • com.blah.util

In other words, items belonging to any one feature are spread out over different packages, according to what might be called behavioral categories. Because the items in each feature need to be visible to each other, this implies that nearly all of the classes in such an application are public.

In effect, this common packaging style throws package-private scope out the window. Package-private scope isn't simply ignored; you're actually prevented from using it. Now, package-private scope was chosen as the default scope by the designers of the Java programming language. This choice was made, of course, to push you gently in the direction of better design -- to start with minimal scope, and increase it only if necessary. (This is the usual "minimize ripple effects by keeping secrets" technique, an idea at the very heart of object programming.) For some strange reason, a significant part of the Java community has, by adopting the package-by-layer convention, rejected an important way of minimizing scope. This seems unjustified.

An alternate style is package-by-feature:

  • com.blah.painting
  • com.blah.buyer
  • com.blah.seller
  • com.blah.auction
  • com.blah.webmaster
  • com.blah.useraccess
  • com.blah.util

Here, items are not grouped according to their behavioral style. Instead, the behavior of individual classes is treated as an implementation detail, and classes are grouped according to the highest possible level of abstraction -- the level of the feature -- wherein all items related to a feature (and only to that feature) reside in one and the same package. This is not a revolutionary or heretical idea; this is exactly what packages were created for in the first place.

For example, in a Web application, the com.blah.painting package might consist of these items:

  • Painting.java: A model object
  • PaintingDAO.java: A data access object
  • PaintingAction.java: A controller or action object
  • statements.sql: The SQL statements used by the DAO
  • view.jsp: The JSP used to render the result to the user

It's important to note that, in this style, each package should contain all items related to a feature (and only that feature). In particular, the package may contain files other than Java source code. In package-by-feature, the ideal is to pass the deletion test: you should be able to delete a feature by deleting a single directory, without leaving behind any cruft.

The benefits of this style over package-by-layer are compelling:

  • Packages have much higher cohesion and modularity. Coupling between packages is minimized.

  • Code is much more self-documenting. The reader gets a general idea of what features are included just by reading the package names. In Code Complete, Steve McConnell refers to self-documenting code as "the Holy Grail of legibility."

  • The design still honors the idea of separation of layers, simply by using separate classes within each feature.

  • Related items are in the same place. There is no need to navigate around the source tree to edit closely related items.

  • Items are package-private by default, as they should be. If an item needs to be visible from another package, then it's changed to public. (Note that changing a class to public does not mean that all of its members should be made public as well; there can be a mixture of public and package-private members in the same class.)

  • Deleting a feature is often as simple as deleting a single item: a directory.

  • There are fewer items in each package, and the overall package structure evolves more naturally. For example, if a package gets too large, then it can be refactored as desired into two or more new packages. The alternative package-by-layer style, however, does not scale or evolve in this way at all: its packages contain an arbitrarily large number of classes, and you cannot easily refactor the package structure.

Some frameworks promote a package-by-layer style with conventional package names as a purported advantage: because conventional package names are used, programmers always know where to navigate to find items. But why force them to navigate at all? With the alternative package-by-feature style, such tedious navigation is usually not needed, so it entirely transcends the need for any such naming convention.

"The single most important factor that distinguishes a well-designed module from a poorly designed one is the degree to which the module hides its internal data and other implementation details from other modules."
-- Joshua Bloch, Effective Java

JavaBeans: Why use them when immutables will do?

Immutable objects are objects that do not change state (their data, in other words) after construction. Martin Odersky, the principal creator of Scala, has recently praised the qualities of immutables. In Effective Java, Joshua Bloch makes a very convincing case in favor of strongly preferring immutable objects. To summarize Bloch's points, immutables:

  • Are simple
  • Are thread-safe, and require no synchronization
  • Can be shared freely
  • Never need to be defensively copied
  • Never need a copy constructor, or a clone() method
  • Make good building blocks for other classes
  • Make good Map keys and Set elements
  • Have failure atomicity -- that is, they cannot be left in an invalid state when an error occurs

"Classes should be immutable unless there is a very good reason for making them mutable," says Bloch. But his advice seems to have been largely ignored. In place of immutable objects, most applications use JavaBeans (or something similar) instead. JavaBean objects are significantly more complex than immutables. Their complexity arises from their large state space. In a rough sense, you might think of a JavaBean as the diametric opposite of an immutable object: it allows maximal mutability.

It's common to use JavaBeans to model database records. Is that an appropriate design? Think of it this way: if you were modeling a row from a database ResultSet, without any preconceptions or framework constraints whatsoever, what would your design be like? Would it be similar to a JavaBean, or would it be significantly different?

I think it would be enormously different:

  • It would likely not include a no-argument constructor, as such constructors carry no data. Does it make sense to model a database ResultSet by creating objects that by design can carry no data, even temporarily? Does a no-argument constructor suggest itself naturally to the mind for this domain? No, it doesn't. (It's true that data is often optional, but how often are all columns in a ResultSet optional?)

  • It would likely not have anything to say about events and listeners.

  • It would likely not force you to use mutable objects. In Web applications in particular, model objects often exist only for the duration of a single request. Such objects aren't long-lived, and don't need to change state in response to user actions.

  • It would likely define a data validation mechanism. Such validation is one of the most important features of database apps, and should be supported directly in the model. (Remember the first thing you learned about objects: an object encapsulates both data and operations on the data. In this case, the operations are the validations.)

  • The validation mechanism would allow for reporting errors to the end user.

The JavaBeans specification was created for a very particular problem domain: that of manipulating graphical widgets at design time. The specification makes absolutely no mention of databases, because it was not created with databases in mind. As a result, it's not surprising that JavaBeans aren't a very natural choice for the problem of modeling database records.

From a practical perspective, many widely-used frameworks require application programmers to use JavaBeans (or something similar) to model database records. This is deeply unfortunate, because it doesn't allow programmers to take advantage of the many positive qualities of immutable objects.

Private members: Why put them up front?

Old Hollywood movies usually start with the credits -- all of the credits. In a similar way, most Java classes place implementation details (the private members) first. Listing 3 shows a typical example of this style.

Listing 3. Private members, listed up front

public class OilWell implements EnergySource {
   private Long id;
   private String name;
   private String location;
   private Date discoveryDate;
   private Long totalReserves;
   private Long productionToDate;
   
   public Long getId() {
      return id;
   }
   public void setId(Long id) {
      this.id = id;
   }
   
  //..elided
}

 

However, placing private items last, not first, seems to show more compassion for the reader. When trying to understand something -- anything -- you should move from the general to the specific. More precisely, you should move from a higher to a lower level of abstraction. If you do the reverse, then initially the reader will likely not know the point of the whole thing, or grasp how things work together, and will be confused by trying to remember a number of potentially disconnected facts.

The whole point of abstraction is to allow you to ignore details. The higher the level of abstraction, the more details you can ignore. The more details the readers of a class can ignore, the happier they are. Holding many details in your head is painful, so fewer details are better. Thus, placing private items last seems more compassionate, as it puts off details that may not be necessary at all for the reader's current task (whatever that may be).

Originally, the C++ programming culture placed private items first, just as in Java. However, unlike the Java community, the C++ community quickly came to realize that this was an unhelpful convention, and it is now reversed. Here is a comment from a typical C++ style guide:

Notice that the public interface is placed first in the class, protected next, and private last. The reasons are:

  • programmers should care about a class's interface more than implementation
  • when programmers need to use a class they need the interface not the implementation

It makes sense then to have the interface first. Placing implementation, the private section, first, is a historical accident, as the first examples used the private-first layout. Over time emphasis has switched, de-emphasizing a class's interface over implementation details.

Similarly, the Imperial College London C++ Style Guide says, "By placing the public section first, everything that is of interest to a user is gathered in the beginning of the class definition. The protected section may be of interest to designers when considering inheriting from the class. The private section contains details that should have the least general interest."

One might object that that readers should use Javadoc when trying to understand a class, and not the source code. This argument doesn't hold, however, because Javadoc includes no information regarding implementation details. When maintaining a class, a programmer clearly needs access to the source code.

For all kinds of technical documentation, it's commonplace to put high-level information at the start -- the table of contents of a book, for instance, or the abstract of an academic paper. Why should a Java class be an exception to this rule? Placing private members first just seems like a bad habit waiting to be broken. The reason it exists in the first place is likely early coding style documents, such as the Coding Conventions published by Sun. As Joshua Bloch noted, "I wouldn't take that document [Sun's Coding Conventions] too seriously. It is not actively maintained or used at Sun."

It seems best to arrange code to imitate the same order used by Javadoc: first the constructors, then non-private methods, and finally private fields and methods. That's the only style in which the reader moves naturally from a higher to a lower level of abstraction.

In conclusion

I have argued in this article that four common Java idioms should be modified. The ultimate justification for such changes is that they will make code demonstrably easier to read, understand, and use -- and, in the process, they will exhibit more compassion for the mental experience of the reader. In the case of immutability and packaging style, they will also nudge you in the direction of improved design.

In summary, I suggest that the following idioms should be favored as the preferred style:

  • Use a naming convention to distinguish three kinds of data, not two: local variables, fields, and method arguments.
  • Prefer the package-by-feature style over package-by-layer.
  • Prefer immutable objects over JavaBeans.
  • Order items in a class in terms of decreasing scope, with private items appearing last.

If the arguments presented in this article are correct, then they lead to a question: Why do harmful idioms still persist in Java programming culture, after so many years? In answer to that question, I would offer the following speculation. Style guides and other tools published in the early years of Java didn't have the benefit of long experience. They got most things right, but a few things wrong. This is completely natural, and doesn't reflect poorly on their creators. Some of these items, which were published by Sun Microsystems, were very widely imitated. They had a very strong influence, and were often treated as authoritative. Perhaps they were treated with a little too much deference.

The few harmful idioms that were created in the early days of Java have persisted because they become popular, ingrained habits -- and perhaps because of a little bit of consensus trance or groupthink, wherein the fact that something is popular or treated as authoritative often overrides any opposing viewpoint.

Will Java programming culture gradually change these bad habits, or will they persist, in spite of their defects? My guess is that, unfortunately, they will remain a part of Java programming culture. But I would love to be proved wrong!

Author Bio

John O'Hanley is the author of javapractices.com, and the creator of the WEB4J framework. He has been programming for 10 years, and currently lives in Prince Edward Island, Canada.

你可能感兴趣的:(Java,IDEA,Scala,SUN,DAO)