Unofficial Erlang Planet

July 29, 2010

Damien Katz

Getting Your Open Source Project to 1.0

The project I founded, Apache CouchDB, recently hit 1.0. I'm very proud :)

It's been a long time, but we finally produced a release that's complete, performs well and is rock solid.

Already CouchDB is on over 10 million machines. It's used by big respected websites (like the BBC) and groundbreaking organizations (Mozilla and Canonical). We run on most *nix, OS X, Windows, and even Android phones. Have dozens of frameworks and client libraries available. There are 2 books available for sale right now. There is a venture capital backed startup, Cloudant, that offers CouchDB hosting and scales to huge datasets. And I'm CEO of another venture backed ($2 million invested) 12 person start-up, Couchio.

So how did I get here? It took a lot of time and effort (almost 5 years!), and the help of a lot of people. Here are some tips of what it took to get CouchDB to 1.0.

Why?

Successful open source projects need a reason for being. You need to decide why you are creating a project and what problems it solves. Whether it's one or many reasons, you need to figure out what they are and explain them.

Perhaps you are making something new, that hasn't existed. Why hasn't it existed it before? No had the idea? No one had the will to carry it through? Or maybe you are making something that's already in existence, like an HTTP server. What are your reasons? Simpler, faster, more features, different license?

If you are just doing it as a learning exercise, that's fine. But don't expect to attract a community until you can explain why it's useful beyond you own personal goals.

With CouchDB, my reasons were:
1. A schemaless document database with views, bi-directional replication and conflict detection to enable disconnected operation would be really useful.
2. I wanted to understand more about creating distributed systems and database internals.

No one cares about reason #2 except for me. But the first reason is compelling.

Make sure you can tell people why your project exists and what it's good for. And put the reasons on your project site where people can find them.

Code Comes First

Don't start a project unless you have a deep commitment to being a strong coder.

Now I'm not saying you must be a strong coder to participate in a project. Not at all. I'm saying that you must be strong coder to lead one. Maybe you'll get lucky and somehow attract a some really good coders to your project. But most really good coders go to projects with already solid codebases, or start their own.

Also, you don't have to be a strong coder when you start out, but you should know the basics and have a strong desire to learn and get better. Don't expect to attract anyone to your project until you have a substantial amount of working code that isn't a big ball of spaghetti.

With CouchDB, I always emphasized the quality of the high-level design and code implementation. We cannot under any circumstances lose or corrupt your committed data, or get things into an inconsistent state. Reliability and durability are absolutely imperative. Any design or implementation that doesn't meet these goals doesn't make it into the project.

Some projects might not have an emphasis on the reliability, but on absolute performance. That's a fine choice to make, but make sure your users know what they are trading off. And then actually deliver on the performance.

As the project moves along, you will need to ensure the code quality (reliability, performance, resource, usage, etc) is improving over time. If you aren't a good coder, you won't be able to do this.

Know What You Aren't

Almost as important as knowing what your project is trying to accomplish, is know what it isn't trying to accomplish.

When your project starts to get traction, but before it's done, you'll get a lot of people who want the project to work more like things they've used in the past. New users might think your goals and abilities are cool, but they'd trade it all for just a little more. They'll want everything your project does, plus a pony.

The problem is feature and scope creep. Even if you are successfully keeping the project on track, the community may get slowed down dealing with people trying to make it something it's not. Stating clearly what your project isn't trying to do or be helps make it much easier to explain what you can't implement or change.

Now, you can't define everything your project isn't. (It's not a video game. It's not accounting software. It's not a banana. It's not a rainbow. etc.). But you can find the things it's related to, overlaps with, or might be confused with, and explicitly say it's not those things.

With CouchDB, because we are a database, people often asked us to add features that were in traditional RDBMS's, but didn't fit well with the CouchDB data model. Not being intimately familiar with CouchDB's model and how it all fits together, they don't realize that what they're asking for simply doesn't work. But because we explicitly stated on the project site we aren't a relational database and aren't trying to replace relational databases, it made it much easier to explain why those features weren't a good fit for what CouchDB is trying to accomplish.

So if you don't clearly define what your project isn't, often people will try to make it into those things. This can damage the community, as moving forward is slower and people feel like they aren't being listened to. Be explicit what you aren't, and it makes it much easier to focus on what you actually are.

Don't Do Everything (Well)

So you are superstar coder, your code is clear and concise and high quality, you write clear complete documentation, your create all the tests, and you fix every bug. You are awesome!

Thing is, you might be awesome, but until you actually get a community behind the project, the project will be limited in an absolute sense by what a single person can produce. And if you are doing everything, that's not a whole lot. Trying to do everything well means you'll probably never actually release anything.

Unfortunately, at first, you _will_ need to do everything. But just don't do everything really well. Instead, you'll have do some things crappily, and then move on. In addition to writing all the code, you'll need to: Create a project site. Explain your project. Write documentation. Do the releases. Start a bug tracker. Create a mailing list and answer questions. And you'll have to do most of these things poorly if you want to keep moving the ball forward.

You'll have to do some things poorly. But you'll need to pick a few things that you do really well and execute on those things. (The code should be one of the things you do well).

And everything you do, you'll need to make it easy for others to participate. To add patches, to update and create documentation, make bug reports and send patches. And make it clear that help is desired.

Don't get hung up on trying to make everything perfect. That just paralyzes you. But by picking a few things to do well, you will attract people to help you with the things you aren't doing well.

Community Wants to Help

Open Source is awesome in the way it attracts people who just want to help make something cool. Many people want to contribute their time, but only if they think their help will amount to something in the long run. They don't want to spend time and effort on something that doesn't yet show potential or might be abandoned if the creators lose interest.

If you have a solid codebase, then it becomes much easier to attract people to your community. If people can recognize there is at least something high quality about your project, but it's lacking in some areas, people will want to help you in those areas. But you have to have the high quality pieces in place. People don't want to be the one excellent contributor to dreck. They'd rather not have their efforts associated at all.

They do want to be a part of something great. They want to add their work and make it even better. They want to contribute to projects where the total excellence of the project reflects well on them and their efforts. They want to make the world a better place, and don't want their efforts wasted.

And people who like making the world a better place are exactly the kind of people you want to attract. You want people to have pride in their contributions, and to feel like they are really positively affecting the things they care about. Those people have lots of projects to choose where they can add their time and talents. If they feel their efforts on your project are wasted, they are gone. Make sure the people who show a strong desire to contribute aren't ignored, and feel like their efforts will eventually amount to something.

Being a part of Apache has helped CouchDB tremendously. Partially it's because Apache has helped our visibility and credibility. But it's also because we've adopted the "Apache Way", which is more focused on the community aspects of a project than on any specific contributor. Without our amazingly active community, CouchDB would be far behind where it is now.

Community Is Often Incompetent

Unfortunately, many people who will want to help you will produce contributions of poor quality. You will have deal with this "help", and do so diplomatically. The best way to point out the shortcomings with their contributions is to identify what needs improvement without denigrating their overall effort. This can be hard, and many don't want to hear why their efforts aren't up to the project's standards.

Sometimes you have to hurt peoples feelings. But it's better to be honest then to have the quality of your project brought down. If they can't handle the feedback, so be it. The good news is the people who do listen to constructive criticism and actually improve the quality of their contributions are incredibly valuable. Look for these people and nurture their involvement.

With CouchDB, we try to listen to all members of our community, but we only grant commit access to the ones who have shown high quality contributions. Our committers are our first line of defense against poor code and design.

Paul Graham Was Right

It seems to make sense to choose a mainstream language for your project. The more mainstream it is, the larger the potential community you can attract. While that's true to an extent, the quality of the community is more important than its absolute size. Much more important.

Using a mainstream language means you are also competing for contributor's time from other projects in the same language. So the pool is large, but in the end, you still have to attract quality developers from other things competing for their attention. And the competition might actually be stronger in that larger pool.

The more mainstream a language, the more likely it is that a random developer knows it because it's what they use at work. They aren't necessarily interested in being more productive, being more reliable, or whatever. They are interested in getting paid, and they choose their language not for elegance, power or performance, but for the number of job openings available.

If you pick a non-mainstream, more esoteric language, you tend to get a higher quality of developer. You tend to find people who absolutely love programming and building, and choose their languages not based on the scale of pay, but because they make the developers and projects more powerful. So while the total pool of contributors is smaller, they tend to give a higher quality of contribution. You get a much better signal to noise ratio.

As Paul Graham explained in Beating the Averages, the exotic languages tend to attract devs who love to learn and expand their toolbox. You'll attract more of the types of devs who don't mind creating new code to fill in the gaps, or diving into source to find a bug. They aren't afraid of what they don't know, they actually get excited by the chance to learn and do something new.

But if you pick enterprisey language X, you might find you are spending more of your time fixing problems and dealing with developers who just don't "get it". If you aren't careful, this can drown your project and bring the total code quality down to the point where you can't find good devs to help you anymore. With the less popular, esoteric languages, that tends to be less of a problem and you get a higher quality of contribution in general.

Use Your Brain

I can keep listing all the stuff we did, but you aren't creating the same project under the same circumstances. Pretty much everything I've said here, we've not followed at some point during the project. Often it was to the detriment of the project, but sometimes it just didn't make sense to blindly follow a rule or guideline.

You have a brain, and using it is the most important thing to remember at anytime. Projects can't follow cookie cutter rules. Even the "Apache Way", as I've discovered, means different things to different people, often at different times.

So take my advice here with a grain of salt, and use your brain to figure out what's actually important to you, your project and it's community. Good luck!

by Damien Katz at July 29, 2010 06:55 PM

July 26, 2010

Damien Katz

CouchCamp is Coming Soon!

713448945.png

CouchCamp, September 8-10

This is the place to be to learn and hack on Apache CouchDB. In honor of the recent 1.0 release, for a limited time it's only $500, with accommodations.

In addition to unconference style discussions, we've got some great speakers: Selena Deckelman, Stuart Langridge, Ted Leung, Josh Berkus, Dion Almaer and me :)

One thing I'm really excited to talk about is our work porting CouchDB to mobile platforms. Android, iOS, RIM, etc. We've got some very cool stuff coming :)

by Damien Katz at July 26, 2010 05:10 PM

July 22, 2010

Erlang Inside

Erlang and REST – an interview with Steve Vinoski

Interview recorded at Erlang Factory, where Steve Vinoski discusses the key concepts of RPC frameworks such as CORBA, and how that compares to REST, and how Erlang fits into both worlds

by Chad DePue at July 22, 2010 12:23 PM

July 21, 2010

Erlang Inside

Zotonic rethinks the CMS with Erlang

A chat with Marc Worrell, Lead Architect of Zotonic - a new Content Management System written entirely in Erlang.

by Chad DePue at July 21, 2010 01:36 PM

July 19, 2010

Joe's blog

Adding Health Checks to Deckard from Chef.

Recently, we (at Cloudant) open sourced Deckard, a HTTP content check monitoring system based on CouchDB. One of the best bits about using Couch is that it gives you a ReST API and with Deckard it can be used to add new health checks. Doing a simple PUT adds new URLs to monitor. At Cloudant we love Chef and use it for everything. Chef has things called resources and providers. Resources are abstractions that describe the state you want a machine to be in. Providers perform the actions described by a resource. A good example is using the package resource on Centos uses yum while on Ubuntu it uses apt-get. The resource abstracts that away, letting the provider (and node) deal with the specifics on how to install the package. This makes your recipes nice and DRY, use the same code to install packages on all sorts of platforms. There are resources and providers for anything from installing packages to even one I wrote for executing Erlang code via erl_call. One resource that works well with Deckard is the HTTP request resource, using it makes it very easy to add health checks from your cookbooks. We use something like the following code to add checks to new nodes at Cloudant:

This code will add the document describing the check to the monitor_content_check database and then create a file so we can use “not_if” and Chef won’t attempt to add the check twice. Pretty cool stuff and even more reason that everything should have an API. Even cooler than this example would be to use Chef Search to do the same thing but I’ll save that for another blog post.

by joe at July 19, 2010 08:52 PM

July 17, 2010

Steve Vinoski

New Column, New Interview

A couple newsworthy items:

by steve at July 17, 2010 06:01 PM

July 08, 2010

Erlang Inside

PostgreSQL Erlang client library epgsql supports asynchronous messages from LISTEN/NOTIFY events

If you use PostgreSQL with Erlang, you're probably already familiar with the epgsql client library. Something that slipped my attention was the addition of...

by Chad DePue at July 08, 2010 07:27 PM

July 05, 2010

Mickaël Rémond

ProcessOne releases OneCached, a Memcached in erlang

ProcessOne has just released OneCached, a Memcached server and client implementation written in Erlang.

OneCached is a new Memcached server and client implementation written from scratch in Erlang by ProcessOne.

From the Memcached website:

What is Memcached?

Free & open source, high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.

Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.

Memcached is simple yet powerful. Its simple design promotes quick deployment, ease of development, and solves many problems facing large data caches. Its API is available for most popular languages.

OneCached supports the set, add, replace, get, incr, decr, delete, flush_all and quit commands. It doesn't handle expiration time.

You call pull the source code from the public repository at: https://git.process-one.net/onecached

git clone git://git.process-one.net/onecached/mainline.git

To compile, just run make, and to start, just type:

bin/onecachedctl start

OneCached is released under the Erlang Public License (EPL), version 1.1. It is available from ProcessOne Labs.

by Nicolas Vérité at July 05, 2010 08:41 PM

Erlang Inside

New Erlang job board – totally-erlang.com

The Zotonic guys put together a new Erlang job board at http://totally-erlang.com/

by Chad DePue at July 05, 2010 11:27 AM

June 29, 2010

Erlang Inside

erldocs.com updated with R14A support, mochiweb, and available for your own projects

Dale Harvey just recently updated erldocs.com with R14A support. If you are doing any erlang development, or even looking to just learn a bit about erlang, I highly recommend using erldocs over the official documentation.

by Chad DePue at June 29, 2010 12:53 PM