Mar

6

A More Effective Java Coding Standard

Posted by Keith McMillan

March 6, 2008 | 2 Comments

Over the years, I’ve read a fair number of Java coding standards. Very few of them, I’m going to go so far as to say “none”, really talk about what they should. That’s a pretty bold statement, so let me explain.

Many coding standards for Java talk about low-value minutiae, things like “your variables should begin with a lower case letter, and be meaningful names”, and how you should place your parenthesis. I’d like a quarter for every 30 seconds that have been wasted talking about that last point. But I digress.

The things first-generation coding standards talk about are making our code “uniform” and “acceptable”, and have been beat to death. These are also largely things that can be handled very nicely with your modern day integrated development environments, thank you very much. Renaming a field takes almost no time at all, and I can set my IDE to reformat the parenthesis just how I like, or however the project dictates they should be formatted. These standards, while they create uniform looking code, have really very little return on the time you spend creating them. What we really needed today are some coding standards that talk about more valuable things, the things that are going to save us real time and prevent errors. Think of these as graduate level coding standards.

Guidance not Dogma

Before we set off, I should point out that almost every rule in this proposed list of coding standards can be broken. “What? Isn’t this a Standard we’re proposing?” I hear you cry. This list is meant to be guidance: we can always come up with some strange situation where the correct thing to do is disregard the rules we’re going to discuss. It’s not possible to formulate a coding standard that provides completely valid guidance in all situations and provides more than trivial guidance. So these guidelines should not be viewed as so rigid that they’re immune from question, or we arrive at simple dogma, and will find ourselves doing the wrong thing for what we believe is the right reason.

In the majority of cases, you can safely use these guidelines as the “normal” way to do things, but if a rule doesn’t work in some case, and you can convince yourself (and your teammates) that you’re right, then by all means, disregard the rule: just don’t do it without thinking, or having a good reason.

The Usual Suspects

The rules I propose here don’t replace all those coding standards you’ve already created, in spite of their lower value, and it would be foolish to suggest that we should disregard the “Java standard” coding practices, that is to say how you name your classes, your variables, etc. We’re looking to add additional value here by proposing the next level of guidance, not to replace sensible standards. These “usual suspects” should not be the main focus of your coding standards, however: pick a reasonable, freely available standard, and reference it from yours. Spend your time focusing on providing better guidance on harder topics.

Throw Meaningful Exceptions

The very first point I’ll make is that it’s important to throw meaningful exceptions. Meaningful is defined by the layer of the application you’re working in. If you’re dealing with a persistence layer, then “meaningful” probably looks something like “save failed” or “retrieve failed”.

Feel free to create a hierarchy of meaningful exceptions, with more detailed exceptions extending the general one if you like, but give the business logic the opportunity to understand what happened: that the operation failed and how. This means that you’re going to have to wrap up more granular exceptions (e.g. “login failed”) into your meaningful exception.

The definition of “meaningful” changes as we move between layers of the application, a meaningful exception for the persistence layer is too detailed to be thrown out of the business logic layer. Exceptions should generally be wrapped up when transitioning levels of business logic. And don’t throw away a perfectly good exception when you create a new one: wrap it up inside your new exception and throw that.

Create the Right Types of Exceptions

At the risk of stating the obvious: exceptions come in two varieties: those that must be declared and handled (checked), and those that don’t (unchecked). When should you use one vs. the other?

Checked exceptions should be used in cases where the caller can reasonably be expected to do something about the exceptional condition, and unchecked exceptions should be used everywhere else. As a concrete example, take our persistence layer exceptions. The user providing invalid login credentials would be a possible checked exception, because the user could correct their credentials. The database being down is a good unchecked exception: the user can’t really do anything about that. The reasons for this rule are pretty straightforward, if a bit subtle.

Checked exceptions require the developers to do something, either to handle the exception, or to pass it along to the higher level. If we overuse checked exceptions it leads to lots of passing the buck, where the exception is simply passed along to the caller, because there’s nothing reasonable to be done about it. If we’re going to pass it all the way back to the originator of the operation without handing it, the exception begins to look very much like an unchecked exception. There is a difference though: the programmer had a bunch of additional things to do that added no value.

Unchecked exceptions don’t require the developers to do anything; they’re simply passed along until someone handles them. Overuse of unchecked exceptions is less severe than overuse of checked exceptions, but creating everything as unchecked begins to feel very much like we’re ignoring any potential benefit of having exception handling. Unchecked exceptions allow us to force the caller into acknowledging the exception. When should we do this? When we can do something about whatever it is that caused the exception.

And so we arrive at our guidance: we use checked exceptions where we expect the caller to be able to fix the problem.

Catching Exceptions

Put simply, this rule says that when catching exceptions, you must do something. There are cases where an exception can be safely disregarded, but in those cases, you should include a comment indicating why it’s okay that you’re ignoring an exception condition.

So, when you catch an exception, you must handle it, rethrow it, wrap it up in a more meaningful exception and throw that, or include a comment saying why it’s okay to ignore.

Use Finally Blocks

This standard is really a gimme, but many developers don’t include finally blocks where they are appropriate, so I include it in the standards to reinforce that you need to look for these. Finally blocks are frequently necessary to clean up resources in the event of an exception, but they don’t get included because we don’t think of them. It’s sufficient in the standard to state “Use finally blocks as appropriate in your code to clean up in the event of an exception.”

Avoid Data Bombs

A data bomb is a datum that is invalid for its intended use, but that isn’t detected until we attempt to use it. A concrete example would help here. Suppose we have an online commerce system, and it has a “customer” and an “order” object. Orders must have a customer, so we have a mutator (sometimes called a setter) that allows us to set a customer onto an order. If we pass a null to the order object, then we’ve created a data bomb: we have an invalid order object, and we won’t know it until we try to retrieve the customer. Data bombs typically manifest themselves in entity objects, those objects that represent a concept in the business domain, because they tend to be data-rich.

With a data bomb, you find out that you have an invalid condition, but the condition may very well have been created “long ago and far away” in code terms, and in some circumstance that you don’t completely understand. This can make it difficult to determine where the error condition originated, and in extreme cases, can result in data corruption as well.

Data bombs are easy to prevent: all you need to do is check your data as soon as you can. The earlier you catch invalid data, the greater chance you have to fix the problem that caused it. Checking your data to disallow invalid values may be tedious, but it sure beats accessing data, discovering it’s not valid, and having no idea how it got that way.

Construct Complete Objects

This rule is related to data bombs above. You should strive to only construct complete objects. If your order object requires a customer, then building an order object should require you to provide a customer. This applies equally well to controller objects: if your business logic controller requires a reference to a data access controller to do its job, then you shouldn’t have the option to create your logic controller without the data controller. As a second-best, you should check as early as possible to verify that this invalid condition doesn’t exist, but this again can result in having an invalid condition, but not really knowing how and when it got that way.

Complexity

Cyclomatic complexity is a complicated name for a pretty simple idea: how many paths are there through your code? If your code goes straight from point A to point B (from start of a method to return) without any branches or loops, it gets a score of “1”. For each time you have a fork in the road, such as a loop or if/else, you add one.

The more complicated your code is, the more difficult it is to understand and to thoroughly test. Some experts claim that any cyclomatic complexity number (CCN) score over 10 is so complicated as to be untestable. Although the merits of this position are debatable, you should strive for as low a complexity in your code as possible.

Brace Your Blocks

This is another gimme: all blocks of code controlled by an if statement, while statement, or other control statement should have parenthesis. This is really only a question if the block consists of a single line of code: if it’s more than one line, you must already use parenthesis, because control statements only control the next statement. The parenthesis create a “virtual statement” consisting of all the enclosed statements.

So why the big deal over parenthesis, given that I think there’s limited value arguing over where they’re placed? It has to do with maintainability. If you place parenthesis around your statements, and you go back later to add additional code to the block, you don’t run the risk of forgetting to put the parenthesis in.

Another benefit arises when you have complicated, nested control structures. Using parenthesis in this case ensures that your code works the way you think it does.

Pass Meaningful Arguments

Arguments passed to methods should have meaningful types. If you don’t follow this rule, you’ll be left trying to figure out which of the seven string arguments you’re passing to the foo() method is which. It’s also usually a symptom of not creating enumerated types when you should.

Apply Meaningful Polymorphism

Polymorphic methods are those that have the same name, but take different arguments. Continuing our example of an online commerce application, you may have a pair of methods called checkout, one which accepts a P.O. number, and a second which accepts credit card information.

All methods with the same name should perform the same logical function, but in different ways. This way, you avoid confusion about the purpose of a method.

Create Methods that Perform Single Functions

Methods should perform a single logical function, rather than performing a number of different functions. If you disregard this rule, you will tend to accumulate multi-purpose methods, with complicated internal logic that determines what the method does. As we’ve already pointed out, complicated methods have high cyclomatic complexity, are difficult to understand, maintain and debug.

Create Cohesive Objects

Cohesive objects are those that contain only related data or methods. Cohesion can be measured by static code analysis tools, and is determined by the analyzing the methods to determine whether distinct, separate groups of attributes are accessed by different methods. An object that is cohesive serves a single purpose, whereas one that is not typically serves more than one.

Avoid Naming Arguments after Fields

There’s an idiom in the Java language of naming the formal parameter of a setter with the same name as a field, and using the “this” qualifier to distinguish the two. That’s a lot of technical terminology, so here’s an example:

public void setInterestRate (int interestRate) {

this.interestRate = interestRate;

}

This is perfectly valid syntax, but imagine for the moment that the “this” keyword gets forgotten. In that case, the setter won’t do anything, but again, it’s perfectly valid syntax. I once spent an entire day trying to find this particular problem. I think it’s good practice to name the actual parameter and the field differently, just to avoid this little gotcha.

Don’t Reinvent the Wheel

Don’t set off to recode a solution to a problem that’s already been solved, unless there’s a really compelling reason to do so. One of the wonderful strengths of the Java language is the namespace support, which makes it possible to incorporate all kinds of third-party solutions to well understood, common problems. Failing to leverage that is inefficient.

Don’t Preserve Code that’s Commented Out

You shouldn’t be keeping blocks of code that’s commented out, particularly when you have a source code control system. Leaving large blocks of commented code is inviting trouble. You can confuse live code for the code that’s been commented out, or you can accidentally remove a comment and cause mayhem.

Keep Inheritance Shallow

Inheritance, the creation of hierarchies of types of objects, should typically be fairly shallow, no more than 4 or 5 “layers”. A deep inheritance hierarchy is a sign of over-complicated design. An exception to this rule is exception hierarchies, which can be richer and involve more layers.

Whitespace is FREE

Whitespace (carriage returns, tabs and spaces) are free, and should be used liberally to improve the readability of your code. This particularly applies to empty lines in the bodies of methods to separate logical blocks.

Include Pictures

As a final point, wherever possible, your coding standard should include examples to help illustrate your rules.

Acknowledgements and Parting Thoughts

These coding rules, or guidelines if you prefer, are only a subset of all the possible ones that could be in a robust coding standard, but they provide a reasonable subset. We could add so much more, but we need to be careful to avoid being overly prescriptive as well: the more that we prohibit, the higher the likelihood that of exceptions to the rule.

These rules are inspired by 20 years of experience in the field, by the work of Josh Bloch and his excellent book Effective Java, by tools such as FindBugs and PMD, and I’m sure by other sources that I’ve forgotten as well. These second generation coding standards focus not on making your code uniform, but on improving maintainability, and on preventing errors. Hopefully, you will find value in them as well.

[asa book]0201310058[/asa]

Comments

RSS feed | Trackback URI

2 Comments »

Comment by Andy Kailhofer

2008-03-17 22:21:47

You know very well how I feel about Java, but these are excellent rules-I-mean-guidelines for any coding standard. They apply just as well to Perl or Python (OK, plus or minus a little syntactic whitespace). Bravo.

Reply to this comment

Comment by Keith McMillan

2008-03-18 19:19:06

Thanks, Andy! I’m glad you liked them. If you create such a standard for Perl or Python, I’d be interested to see it, although I don’t speak Python…