18.10. Developer Guidelines

18.10.1. Codelines

Most development is done on the master branch, which is named master in the Git repository. Previously, HBase used Subversion, in which the master branch was called TRUNK. Branches exist for minor releases, and important features and bug fixes are often back-ported.

18.10.2. Release Managers

Each maintained release branch has a release manager, who volunteers to coordinate new features and bug fixes are backported to that release. The release managers are committers. If you would like your feature or bug fix to be included in a given release, communicate with that release manager. If this list goes out of date or you can't reach the listed person, reach out to someone else on the list.

Note

End-of-life releases are not included in this list.

Table 18.1. Release Managers

ReleaseRelease Manager

0.98

Andrew Purtell

1.0

Enis Soztutar


18.10.3. Code Standards

See Section 18.3.1.1, “Code Formatting” and Section 18.10.3.2, “Code Formatting Conventions”.

18.10.3.1. Interface Classifications

Interfaces are classified both by audience and by stability level. These labels appear at the head of a class. The conventions followed by HBase are inherited by its parent project, Hadoop.

The following interface classifications are commonly used:

@InterfaceAudience

@InterfaceAudience.Public

APIs for users and HBase applications. These APIs will be deprecated through major versions of HBase.

@InterfaceAudience.Private

APIs for HBase internals developers. No guarantees on compatibility or availability in future versions. Private interfaces do not need an @InterfaceStability classification.

@InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.COPROC)

APIs for HBase coprocessor writers. As of HBase 0.92/0.94/0.96/0.98 this api is still unstable. No guarantees on compatibility with future versions.

No @InterfaceAudience Classification

Packages without an @InterfaceAudience label are considered private. Mark your new packages if publicly accessible.

Excluding Non-Public Interfaces from API Documentation

Only interfaces classified @InterfaceAudience.Public should be included in API documentation (Javadoc). Committers must add new package excludes ExcludePackageNames section of the pom.xml for new packages which do not contain public classes.

@InterfaceStability

@InterfaceStability is important for packages marked @InterfaceAudience.Public.

@InterfaceStability.Stable

Public packages marked as stable cannot be changed without a deprecation path or a very good reason.

@InterfaceStability.Unstable

Public packages marked as unstable can be changed without a deprecation path.

@InterfaceStability.Evolving

Public packages marked as evolving may be changed, but it is discouraged.

No @InterfaceStability Label

Public classes with no @InterfaceStability label are discouraged, and should be considered implicitly unstable.

If you are unclear about how to mark packages, ask on the development list.

18.10.3.2. Code Formatting Conventions

Please adhere to the following guidelines so that your patches can be reviewed more quickly. These guidelines have been developed based upon common feedback on patches from new contributors.

See the Code Conventions for the Java Programming Language for more information on coding conventions in Java.

18.10.3.2.1. Space Invaders

Do not use extra spaces around brackets. Use the second style, rather than the first.

if ( foo.equals( bar ) ) {     // don't do this
if (foo.equals(bar)) {
foo = barArray[ i ];     // don't do this
foo = barArray[i];
18.10.3.2.2. Auto Generated Code

Auto-generated code in Eclipse often uses bad variable names such as arg0. Use more informative variable names. Use code like the second example here.

 public void readFields(DataInput arg0) throws IOException {    // don't do this
   foo = arg0.readUTF();                                       // don't do this
 public void readFields(DataInput di) throws IOException {
   foo = di.readUTF();
18.10.3.2.3. Long Lines

Keep lines less than 100 characters. You can configure your IDE to do this automatically.

Bar bar = foo.veryLongMethodWithManyArguments(argument1, argument2, argument3, argument4, argument5, argument6, argument7, argument8, argument9);  // don't do this
Bar bar = foo.veryLongMethodWithManyArguments(
 argument1, argument2, argument3,argument4, argument5, argument6, argument7, argument8, argument9);
18.10.3.2.4. Trailing Spaces

Trailing spaces are a common problem. Be sure there is a line break after the end of your code, and avoid lines with nothing but whitespace. This makes diffs more meaningful. You can configure your IDE to help with this.

Bar bar = foo.getBar();     <--- imagine there is an extra space(s) after the semicolon.
18.10.3.2.5. API Documentation (Javadoc)

This is also a very common feedback item. Don't forget Javadoc!

Javadoc warnings are checked during precommit. If the precommit tool gives you a '-1', please fix the javadoc issue. Your patch won't be committed if it adds such warnings.

18.10.3.2.6. Findbugs

Findbugs is used to detect common bugs pattern. It is checked during the precommit build by Apache's Jenkins. If errors are found, please fix them. You can run findbugs locally with mvn findbugs:findbugs, which will generate the findbugs files locally. Sometimes, you may have to write code smarter than findbugs. You can annotate your code to tell findbugs you know what you're doing, by annotating your class with the following annotation:

@edu.umd.cs.findbugs.annotations.SuppressWarnings(
value="HE_EQUALS_USE_HASHCODE",
justification="I know what I'm doing")

It is important to use the Apache-licensed version of the annotations.

18.10.3.2.7. Javadoc - Useless Defaults

Don't just leave the @param arguments the way your IDE generated them.:

  /**
   *
   * @param bar             <---- don't do this!!!!
   * @return                <---- or this!!!!
   */
  public Foo getFoo(Bar bar);

Either add something descriptive to the @param and @return lines, or just remove them. The preference is to add something descriptive and useful.

18.10.3.2.8. One Thing At A Time, Folks

If you submit a patch for one thing, don't do auto-reformatting or unrelated reformatting of code on a completely different area of code.

Likewise, don't add unrelated cleanup or refactorings outside the scope of your Jira.

18.10.3.2.9. Ambigious Unit Tests

Make sure that you're clear about what you are testing in your unit tests and why.

18.10.3.2.10. Implementing Writable

Applies pre-0.96 only

In 0.96, HBase moved to protocol buffers (protobufs). The below section on Writables applies to 0.94.x and previous, not to 0.96 and beyond.

Every class returned by RegionServers must implement the Writable interface. If you are creating a new class that needs to implement this interface, do not forget the default constructor.

18.10.4. Invariants

We don't have many but what we have we list below. All are subject to challenge of course but until then, please hold to the rules of the road.

18.10.4.1. No permanent state in ZooKeeper

ZooKeeper state should transient (treat it like memory). If ZooKeeper state is deleted, hbase should be able to recover and essentially be in the same state.

Exceptions

  • There are currently a few exceptions that we need to fix around whether a table is enabled or disabled.

  • Replication data is currently stored only in ZooKeeper. Deleting ZooKeeper data related to replication may cause replication to be disabled. Do not delete the replication tree, /hbase/replication/.

    Warning

    Replication may be disrupted and data loss may occur if you delete the replication tree (/hbase/replication/) from ZooKeeper. Follow progress on this issue at HBASE-10295.

18.10.5. Running In-Situ

If you are developing Apache HBase, frequently it is useful to test your changes against a more-real cluster than what you find in unit tests. In this case, HBase can be run directly from the source in local-mode. All you need to do is run:

${HBASE_HOME}/bin/start-hbase.sh

This will spin up a full local-cluster, just as if you had packaged up HBase and installed it on your machine.

Keep in mind that you will need to have installed HBase into your local maven repository for the in-situ cluster to work properly. That is, you will need to run:

mvn clean install -DskipTests

to ensure that maven can find the correct classpath and dependencies. Generally, the above command is just a good thing to try running first, if maven is acting oddly.

18.10.6. Adding Metrics

After adding a new feature a developer might want to add metrics. HBase exposes metrics using the Hadoop Metrics 2 system, so adding a new metric involves exposing that metric to the hadoop system. Unfortunately the API of metrics2 changed from hadoop 1 to hadoop 2. In order to get around this a set of interfaces and implementations have to be loaded at runtime. To get an in-depth look at the reasoning and structure of these classes you can read the blog post located here. To add a metric to an existing MBean follow the short guide below:

18.10.6.1. Add Metric name and Function to Hadoop Compat Interface.

Inside of the source interface the corresponds to where the metrics are generated (eg MetricsMasterSource for things coming from HMaster) create new static strings for metric name and description. Then add a new method that will be called to add new reading.

18.10.6.2. Add the Implementation to Both Hadoop 1 and Hadoop 2 Compat modules.

Inside of the implementation of the source (eg. MetricsMasterSourceImpl in the above example) create a new histogram, counter, gauge, or stat in the init method. Then in the method that was added to the interface wire up the parameter passed in to the histogram.

Now add tests that make sure the data is correctly exported to the metrics 2 system. For this the MetricsAssertHelper is provided.

18.10.7. Git Best Practices

  • Use the correct method to create patches. See Section 18.10.8, “Submitting Patches”.

  • Avoid git merges. Use git pull --rebase or git fetch followed by git rebase.

  • Do not use git push --force. If the push does not work, fix the problem or ask for help.

Please contribute to this document if you think of other Git best practices.

18.10.7.1. rebase_all_git_branches.sh

The dev-support/rebase_all_git_branches.sh script is provided to help keep your Git repository clean. Use the -h parameter to get usage instructions. The script automatically refreshes your tracking branches, attempts an automatic rebase of each local branch against its remote branch, and gives you the option to delete any branch which represents a closed HBASE- JIRA. The script has one optional configuration option, the location of your Git directory. You can set a default by editing the script. Otherwise, you can pass the git directory manually by using the -d parameter, followed by an absolute or relative directory name, or even '.' for the current working directory. The script checks the directory for sub-directory called .git/, before proceeding.

18.10.8. Submitting Patches

HBase moved to GIT from SVN. Until we develop our own documentation for how to contribute patches in our new GIT context, caveat the fact that we have a different branching model and that we don't currently do the merge practice described in the following, the accumulo doc on how to contribute and develop after our move to GIT is worth a read. See also Section 18.10.7, “Git Best Practices”.

If you are new to submitting patches to open source or new to submitting patches to Apache, start by reading the On Contributing Patches page from Apache Commons Project. It provides a nice overview that applies equally to the Apache HBase Project.

18.10.8.1. Create Patch

The script dev-support/make_patch.sh has been provided to help you adhere to patch-creation guidelines. The script has the following syntax:

$ make_patch.sh [-a] [-p <patch_dir>]
  1. If you do not pass a patch_dir, the script defaults to ~/patches/. If the patch_dir does not exist, it is created.

  2. By default, if an existing patch exists with the JIRA ID, the version of the new patch is incremented (HBASE-XXXX-v3.patch). If the -a option is passed, the version is not incremented, but the suffix -addendum is added (HBASE-XXXX-v2-addendum.patch). A second addendum to a given version is not supported.

  3. Detects whether you have more than one local commit on your branch. If you do, the script offers you the chance to run git rebase -i to squash the changes into a single commit so that it can use git format-patch. If you decline, the script uses git diff instead. The patch is saved in a configurable directory and is ready to be attached to your JIRA.

Patching Workflow

  • Always patch against the master branch first, even if you want to patch in another branch. HBase committers always apply patches first to the master branch, and backport if necessary.

  • Submit one single patch for a fix. If necessary, squash local commits to merge local commits into a single one first. See this Stack Overflow question for more information about squashing commits.

  • The patch should have the JIRA ID in the name. If you are generating from a branch, include the target branch in the filename. A common naming scheme for patches is:

    HBASE-XXXX.patch
    HBASE-XXXX-0.90.patch     # to denote that the patch is against branch 0.90
    HBASE-XXXX-v3.patch       # to denote that this is the third version of the patch
  • To submit a patch, first create it using one of the methods in Methods to Create Patches. Next, attach the patch to the JIRA (one patch for the whole fix), using the MoreAttach Files dialog. Next, click the Patch Available button, which triggers the Hudson job which checks the patch for validity.

    Please understand that not every patch may get committed, and that feedback will likely be provided on the patch.

  • If your patch is longer than a single screen, also attach a Review Board to the case. See Section 18.10.8.4, “ReviewBoard”.

  • If you need to revise your patch, leave the previous patch file(s) attached to the JIRA, and upload the new one, following the naming conventions in Section 18.10.8.1, “Create Patch”. Cancel the Patch Available flag and then re-trigger it, by toggling the Patch Available button in JIRA. JIRA sorts attached files by the time they were attached, and has no problem with multiple attachments with the same name. However, at times it is easier to refer to different version of a patch if you add -vX, where the X is the version (starting with 2).

  • If you need to submit your patch against multiple branches, rather than just master, name each version of the patch with the branch it is for, following the naming conventions in Section 18.10.8.1, “Create Patch”.

Methods to Create Patches

Eclipse

Select the TeamCreate Patch menu item.

Git

git format-patch is preferred because it preserves commit messages. Use git rebase -i first, to combine (squash) smaller commits into a single larger one.

Note

Do not use --no-prefix, even if you were in the habit of doing it previously.

  • git format-patch origin/master --stdout > HBASE-XXXX.patch
  • git diff origin/master > HBASE-XXXX.patch

If your branch is based upon a different remote branch, replace origin/master with the remote to compare against.

If you are new to creating patches, it's a good idea to check out a fresh branch and try to apply your patch to it. If you used git format-patch, apply the patch using git am. Otherwise, use git apply. If the patch does not apply correctly, fix the problem and try again.

Subversion
svn diff > HBASE-XXXX.patch

Make sure you review Section 18.3.1.1, “Code Formatting” and Section 18.10.3.2, “Code Formatting Conventions” for code style. If your patch was generated incorrectly or your code does not adhere to the code formatting guidelines, you may be asked to redo some work.

18.10.8.2. Unit Tests

Yes, please. Please try to include unit tests with every code patch (and especially new classes and large changes). Make sure unit tests pass locally before submitting the patch.

Also, see Section 19.2, “Mockito”.

If you are creating a new unit test class, notice how other unit test classes have classification/sizing annotations at the top and a static method on the end. Be sure to include these in any new unit test files you generate. See Section 18.9, “Tests” for more on how the annotations work.

18.10.8.3. Integration Tests

Significant new features should provide an integration test in addition to unit tests, suitable for exercising the new feature at different points in its configuration space.

18.10.8.4. ReviewBoard

Patches larger than one screen, or patches that will be tricky to review, should go through ReviewBoard.

Procedure 18.3. Use ReviewBoard

  1. Register for an account if you don't already have one. It does not use the credentials from issues.apache.org. Log in.

  2. Click New Review Request.

  3. Choose the hbase-git repository. Click Choose File to select the diff and optionally a parent diff. Click Create Review Request.

  4. Fill in the fields as required. At the minimum, fill in the Summary and choose hbase as the Review Group. If you fill in the Bugs field, the review board links back to the relevant JIRA. The more fields you fill in, the better. Click Publish to make your review request public. An email will be sent to everyone in the hbase group, to review the patch.

  5. Back in your JIRA, click MoreLinkWeb Link, and paste in the URL of your ReviewBoard request. This attaches the ReviewBoard to the JIRA, for easy access.

  6. To cancel the request, click CloseDiscarded.

For more information on how to use ReviewBoard, see the ReviewBoard documentation.

18.10.8.5. Guide for HBase Committers

18.10.8.5.1. New committers

New committers are encouraged to first read Apache's generic committer documentation:

18.10.8.5.2. Review

HBase committers should, as often as possible, attempt to review patches submitted by others. Ideally every submitted patch will get reviewed by a committer within a few days. If a committer reviews a patch they have not authored, and believe it to be of sufficient quality, then they can commit the patch, otherwise the patch should be cancelled with a clear explanation for why it was rejected.

The list of submitted patches is in the HBase Review Queue, which is ordered by time of last modification. Committers should scan the list from top to bottom, looking for patches that they feel qualified to review and possibly commit.

For non-trivial changes, it is required to get another committer to review your own patches before commit. Use the Submit Patch button in JIRA, just like other contributors, and then wait for a +1 response from another committer before committing.

18.10.8.5.3. Reject

Patches which do not adhere to the guidelines in HowToContribute and to the code review checklist should be rejected. Committers should always be polite to contributors and try to instruct and encourage them to contribute better patches. If a committer wishes to improve an unacceptable patch, then it should first be rejected, and a new patch should be attached by the committer for review.

18.10.8.5.4. Commit

Committers commit patches to the Apache HBase GIT repository.

Before you commit!!!!

Make sure your local configuration is correct, especially your identity and email. Examine the output of the $ git config --list command and be sure it is correct. See this GitHub article, Set Up Git if you need pointers.

When you commit a patch, please:

  1. Include the Jira issue id in the commit message, along with a short description of the change and the name of the contributor if it is not you. Be sure to get the issue ID right, as this causes Jira to link to the change in Git (use the issue's "All" tab to see these).

  2. Commit the patch to a new branch based off master or other intended branch. It's a good idea to call this branch by the JIRA ID. Then check out the relevant target branch where you want to commit, make sure your local branch has all remote changes, by doing a git pull --rebase or another similar command, cherry-pick the change into each relevant branch (such as master), and do git push <remote-server> <remote-branch>.

    Warning

    If you do not have all remote changes, the push will fail. If the push fails for any reason, fix the problem or ask for help. Do not do a git push --force.

    Before you can commit a patch, you need to determine how the patch was created. The instructions and preferences around the way to create patches have changed, and there will be a transition periond.

    Determine How a Patch Was Created

    • If the first few lines of the patch look like the headers of an email, with a From, Date, and Subject, it was created using git format-patch. This is the preference, because you can reuse the submitter's commit message. If the commit message is not appropriate, you can still use the commit, then run the command git rebase -i origin/master, and squash and reword as appropriate.

    • If the first line of the patch looks similar to the following, it was created using git diff without --no-prefix. This is acceptable too. Notice the a and b in front of the file names. This is the indication that the patch was not created with --no-prefix.

      diff --git a/src/main/docbkx/developer.xml b/src/main/docbkx/developer.xml
    • If the first line of the patch looks similar to the following (without the a and b), the patch was created with git diff --no-prefix and you need to add -p0 to the git apply command below.

      diff --git src/main/docbkx/developer.xml src/main/docbkx/developer.xml

    Example 18.3. Example of Committing a Patch

    One thing you will notice with these examples is that there are a lot of git pull commands. The only command that actually writes anything to the remote repository is git push, and you need to make absolutely sure you have the correct versions of everything and don't have any conflicts before pushing. The extra git pull commands are usually redundant, but better safe than sorry.

    The first example shows how to apply a patch that was generated with git format-patch and apply it to the master and branch-1 branches.

    The directive to use git format-patch rather than git diff, and not to use --no-prefix, is a new one. See the second example for how to apply a patch created with git diff, and educate the person who created the patch.

    $ git checkout -b HBASE-XXXX
    $ git am ~/Downloads/HBASE-XXXX-v2.patch
    $ git checkout master
    $ git pull --rebase
    $ git cherry-pick <sha-from-commit>
    # Resolve conflicts if necessary or ask the submitter to do it
    $ git pull --rebase          # Better safe than sorry
    $ git push origin master
    $ git checkout branch-1
    $ git pull --rebase
    $ git cherry-pick <sha-from-commit>
    # Resolve conflicts if necessary
    $ git pull --rebase          # Better safe than sorry
    $ git push origin branch-1
    $ git branch -D HBASE-XXXX
                                    

    This example shows how to commit a patch that was created using git diff without --no-prefix. If the patch was created with --no-prefix, add -p0 to the git apply command.

    $ git apply ~/Downloads/HBASE-XXXX-v2.patch 
    $ git commit -m "HBASE-XXXX Really Good Code Fix (Joe Schmo)" -a # This extra step is needed for patches created with 'git diff'
    $ git checkout master
    $ git pull --rebase
    $ git cherry-pick <sha-from-commit>
    # Resolve conflicts if necessary or ask the submitter to do it
    $ git pull --rebase          # Better safe than sorry
    $ git push origin master
    $ git checkout branch-1
    $ git pull --rebase
    $ git cherry-pick <sha-from-commit>
    # Resolve conflicts if necessary or ask the submitter to do it
    $ git pull --rebase           # Better safe than sorry
    $ git push origin branch-1
    $ git branch -D HBASE-XXXX

  3. Resolve the issue as fixed, thanking the contributor. Always set the "Fix Version" at this point, but please only set a single fix version for each branch where the change was committed, the earliest release in that branch in which the change will appear.

18.10.8.5.4.1. Commit Message Format

The commit message should contain the JIRA ID and a description of what the patch does. The preferred commit message format is:

<jira-id> <jira-title> (<contributor-name-if-not-commit-author>)
HBASE-12345 Fix All The Things (jane@example.com)

If the contributor used git format-patch to generate the patch, their commit message is in their patch and you can use that, but be sure the JIRA ID is at the front of the commit message, even if the contributor left it out.

18.10.8.5.4.2. Add Amending-Author when a conflict cherrypick backporting

We've established the practice of committing to trunk and then cherry picking back to branches whenever possible. When there is a minor conflict we can fix it up and just proceed with the commit. The resulting commit retains the original author. When the amending author is different from the original committer, add notice of this at the end of the commit message as: Amending-Author: Author <committer&apache> See discussion at HBase, mail # dev - [DISCUSSION] Best practice when amending commits cherry picked from master to branch.

18.10.8.5.4.3. Committers are responsible for making sure commits do not break the build or tests

If a committer commits a patch, it is their responsibility to make sure it passes the test suite. It is helpful if contributors keep an eye out that their patch does not break the hbase build and/or tests, but ultimately, a contributor cannot be expected to be aware of all the particular vagaries and interconnections that occur in a project like HBase. A committer should.

18.10.8.5.4.4. Patching Etiquette

In the thread HBase, mail # dev - ANNOUNCEMENT: Git Migration In Progress (WAS => Re: Git Migration), it was agreed on the following patch flow

  1. Develop and commit the patch against trunk/master first.

  2. Try to cherry-pick the patch when backporting if possible.

  3. If this does not work, manually commit the patch to the branch.

18.10.8.5.4.5. Merge Commits

Avoid merge commits, as they create problems in the git history.

18.10.8.5.4.6. Committing Documentation

See Appendix A, Contributing to Documentation.

18.10.8.6. Dialog

Committers should hang out in the #hbase room on irc.freenode.net for real-time discussions. However any substantive discussion (as with any off-list project-related discussion) should be re-iterated in Jira or on the developer list.

18.10.8.7. Do not edit JIRA comments

Misspellings and/or bad grammar is preferable to the disruption a JIRA comment edit causes: See the discussion at Re:(HBASE-451) Remove HTableDescriptor from HRegionInfo

comments powered by Disqus