First Monday

Letters to the Editor

From: Daniel DeMaggio
To: Sandeep Krishnamurthy
CC: Edward J. Valauskas, Chief Editor, First Monday
Subject: Cave or Community? An Empirical Examination of 100 Mature Open Source Projects
Date: 18 Jun 2002 10:03:40

In your paper, "Cave or Community? An Empirical Examination of 100 Mature Open Source Projects" [in First Monday, volume 7, number 6 (June 2002), at http://firstmonday.org/issues/issue7_6/krishnamurthy/] you make some bold claims about open source being developed by 'lone developers.' But your conslusions are suspect because of the completely unscientific nature of your analysis.

Let me give you an example: GhostScript is a well respected, mature open source program. It is distributed with every Linux desktop distribution. It is central to many printing architectures. The blog on their homepage shows hundreds of accounts. The package's Helpers.htm file included in the distribution credits about 50 people for many fixes and even entire printer drivers.

Here's the problem:

  1. You didn't count GhostScript in your analysis because it wasn't marked as "mature" in SourceForge.
  2. Your count of developers would have been extremely low (SF lists only 18 developers.)
  3. You would have said there was no active mailing list, even though GS has 13 different active mailing lists (listed on their homepage, not SF.)

There's a term for this: GIGO: Garbage In, Garbage Out. You rely on SF data, but provide no evidence that the data is useful. In fact, I would argue that mature products tend to pre-date SF, and therefore are the least likely to rely on SF features. After all, they were happily existing before SF, so they are less likely to switch to a new mailing list, or re-create accounts for all their developers on the new CVS.

At best, your data is heavily biased, and you failed to address that in your paper.

You also say:

"Formally separating software production from other steps in the development of OSS programs will provide greater clarity to the discussion of the OSS phenomenon."

I take issue with this for several reasons.

First, I think software is more muddled if you try to break it up into steps. The "waterfall" development methodology (where the "coding step" clearly follows the "requirements analysis" step) has proven to be a terrible model. No software company does this, why should OSS?

Second, OSS blurs the lines between developers and users. If a user has the gumption to fix a bug or add a feature, he usually submits a patch to the mailing list. But your methodology would not count that as 'development'. (Indeed, it would take a hefty AI to read the mailing lists, and see which patches actually make it to the released software so you can label those people "developers".) [Even worse, what do you call people like me who change their OSS code internally, but don't submit patches?]

Third, it ignores the 'soft' contributions. Users often suggest innovative ideas to the "real" developers via the mailing list. It's easy to ignore them and say they aren't really helping the developers. But consider Extreme Programming: There are two people at the computer, but only one types at a time. Would you say "Only the guy at the keyboard is programming?" If the other person isn't 'involved in software production,' then why have two people at the computer? By analogy, the people exchanging ideas on the mailing list/newsgroup/IRC are directly involved in software production too.

Your idea of studying OSS for 'lone developers' is a good one. I'm sure we would all be suprized by the number of projects like that. But your article doesn't make a serious attempt to study the issue scientifically.

Daniel DeMaggio


Letter to the Editor, First Monday
From: Sandeep Krishnamurthy
To: Edward J. Valauskas, Chief Editor, First Monday

Introduction

One of the advantages of conducting research in the open-source area is that one gets a lot of feedback rapidly. Since the publication of my paper on First Monday [at http://firstmonday.org/issues/issue7_6/krishnamurthy/], I have been deluged with e-mails. My paper was also generated interest on Slashdot (a leading tech news community) with a total of 270 comments on last count.

In this spirit of interactivity, I want to take advantage of this forum to address some of the feedback that I have received about the paper and its findings.

Preliminary Analysis

The main purpose of the paper was to attract the attention of the open-source academic community to the surprising findings of my study. Much of the research that I had seen up to that point presupposed the existence of community. The goal was to challenge this notion by tapping into the richness of the open source(OS) movement.

In short, my goal was to help create greater clarity in the discourse surrounding open-source communities.

I took great pains to make sure that all readers understood that the work was preliminary. The sample size, method of collection and the relationship of that size to the overall number of projects were clearly mentioned. I pointed out that the findings were limited to the projects I looked at and not all OS projects. I did not conduct detailed statistical analysis. I plan on conducting several follow-up studies to help us get greater insight into this topic.

It is common to conduct exploratory studies in academia- especially in an area without a long publication track record. It was in that spirit that the work was done. I thought researchers who were looking at all OS projects as production communities would be surprised to learn of the number of projects run by single individuals or very small groups.

For the record, I had contacted Sourceforge about providing access to their data. After an initial positive response, they backed off- phone calls and e-mails were not returned. If their dataset is made available (even in some truncated format) to academic researchers such as myself, I am sure many more interesting things can be done.

Also, the goal of the paper was not to deride the OS community.

Why did I find what I found?

Several people have provided alternative explanations for my findings.

One explanation for the small group size was that Sourceforge did not represent the OS movement. I find this hard to accept given the size and vibrant nature of Sourceforge. Perhaps, some do not want to accept these projects within the OS movement.

Some argued that mature projects might need fewer people - there is some evidence to suggest this in my paper (See Table 4). This makes sense because mature projects are likely to have fewer bugs and are released more often requiring less effort.

Others have pointed out that the developers classify products into different groups on Sourceforge. In this way of thinking, projects that are, in fact, mature may be classified otherwise and hence, I may not have looked at all mature projects. It is hard to know how big a problem this is and it is unclear why project administrators would do that.

Some project administrators told me that they might not have listed every single developer. However, one suspects that the smaller and less significant contributors may have been left out making this a smaller problem. I plan on looking at this more carefully in future research.

vI had also pointed out that the production stage may have to be isolated for study as opposed to other functions - i.e., lead users, service etc. Some have objected to this thinking by saying that this is impossible to do in software where there are strong feedback loops between all functions. I still stand by my statement.

Small ... Big. Does it Matter?

Most readers focused on Finding 1 of the paper - i.e., the small group size in the projects. This finding resonated very well with many developers - see the Slashdot thread, for instance. Many reasons were given for why developers would desire small groups -

In general, it seems like there is an inner core in most OS projects that is very hard to penetrate. These are the core producers - the trusted developers. To get in, one must frequently be in a position to impress the founder(s) with quality code on a consistent basis. This may take time and may not ever take place in some situations.

Is a small group size necessarily good or bad? One cannot really say. There may be no need for a large community of developers for smaller projects. It is okay for some projects to be created by small groups of developers (i.e., in caves).

So, What's Going On Here?

My current thinking is that there are two types of OS projects - the mega-organized ones and the small hobby projects. The mega-organized projects are the ones that we typically hear about - i.e., LINUX, Apache etc. There is no doubt that these projects have built large communities. My point is that there are also a whole lot of small "hobby" projects run by committed individuals. These are not huge in scope - however, there are a lot of them and they are an integral part of the OS movement.

Building a community around an OS product is hard work. Developers need to be convinced that the product is exciting and important. LINUX has done well since it aimed high - i.e., it created a new operating system (rather than an application or utility). Organizational backing helps. In the intense competition for developers, smaller products (especially those with medium to large scope) may suffer. In other cases, OS products may only be looking for a user audience- not for a community of developers.

The OS movement is large, heterogeneous, dynamic and evolving. Understanding its nature will provide academic researchers with fascinating issues to look in the near and distant future. I look forward to working in this area in the foreseeable future.

Sandeep Krishnamurthy


Contents Index

Copyright ©2002, First Monday

Letter to the Editor
First Monday, volume 7, number 9 (September 2002),
URL: http://firstmonday.org/issues/issue7_9/letters/index.html