Brief study of tools to analyze software repositories

Working on some other aspects, I came up with a list of tools to analyse software repositories and source code.

Among others, this list is specifically focused on tools to analyse source code management systems, such as CVS or Git, mailing lists, bug tracking systems and source code.

 

CVSAnalY2:

  • Description: analyses several source code management systems and stores all of the information found in a relational database. CVSAnalY2 currently supports CVS, SVN and Git.
  • URL: http://git.libresoft.es/cvsanaly/
  • Cite: Tools for the Study of the Usual Data Sources found in Libre Software Projects . International Journal of Open Source Software and Processes. Vol. 1, issue 1.

 

MailingListStats:

  • Description:  analyses mailing lists and stores all of the information in a relational database.
  • URL http://git.libresoft.es/mailingliststat/
  • Cite: Tools for the Study of the Usual Data Sources found in Libre Software Projects . International Journal of Open Source Software and Processes. Vol. 1, issue 1.

 

Bicho:

  • Description: analyses bug tracking systems and stores all of the information in a relational database. This tool currently supports specific installations of Bugzilla, Jira and the SourceForge tracker.
  • URL: http://git.libresoft.es/bicho/
  • Cite: Tools for the Study of the Usual Data Sources found in Libre Software Projects . International Journal of Open Source Software and Processes. Vol. 1, issue 1.

 

Guilty:

  • Description: stores the authorship information of each line for a given revision in several source code management systems such as SVN, CVS and Git.
  • URL: http://git.libresoft.es/guilty/

 

BlameMe:

  • Description: stores the differences between each pair of revisions from Mercurial and Git repositories in a MySQL database.
  • URL: http://git.libresoft.es/blameme
  • Cite: Are Developers Fixing Their Own Bugs?: Tracing Bug-Fixing and Bug-Seeding Committers (pages 23-42). Daniel Izquierdo-Cortazar (Universidad Rey Juan Carlos, Spain), Andrea Capiluppi (University of East London, UK), and Jesus M.. Gonzalez-Barahona (Universidad Rey Juan Carlos, Spain)

 

SLOCCount:

  • Description: analyses the source code providing information about the type of programming language, number of files and lines and an estimation effort model based on COCOMO.
  • URL: www.dwheeler.com/sloccount/

 

OhCount

  • Description: analyses the source code providing information about the type of programming language and number of lines and files.
  • URL: www.ohloh.net/p/ohcount

 

Cloc:

  • Description: analyses the source code providing information about the type of programming language and number of lines and files.
  • URL: http://cloc.sourceforge.net

 

Cmetrics:

 

PyMetrics:

 

PerlMetrics:

 

pmccabe:

 

ckjm:

 

FOSSology:

  • Description: analyses software from different perspectives, such as licenses, meta data extraction and MIME type identification.
  • URL: http://www.fossology.org/
  • Cite: Robert Gobeille. 2008. The FOSSology project. In Proceedings of the 2008 international working conference on Mining software repositories (MSR ’08). ACM, New York, NY, USA, 47-50

 

Blackbird:

 

Ninka:

  • Description: analyses the source code and identify licenses
  • URL: http://ninka.turingmachine.org/
  • Cite: A sentence-matching method for automatic license identification of source code files by D.M. German, Y. Manabe and K. Inoue. In Proceedings of the IEEE/ACM international Conference on Automated Software Engineering (ASE) 2010, pp: 437–446

 

Sonar:

 

ConQAT:

  • Description: creates a quality dashboard that allow to track continuously the characteristics of a software system.
  • URL: http://conqat.cs.tum.edu/index.php/ConQAT
  • Cite: Tool Support for Continuous Quality Control. Deissenboeck, F., Juergens, E., Hummel, B., Wagner, S., Mas y Parareda, B., Pizka, M. IEEE International Workshop on Software Technology and Engineering Practice , 2005.

 

There are probably some more that I’m missing, but you are more than welcome to help me to complete this list :) .

Some updates (thanks for the pointers!):

 

Analizo

  • Description: “Analizo is a free, multi-language, extensible source code analysis and visualization toolkit”
  • URL: http://analizo.org/

 

Gitdm

 

 

Paper accepted at ESE: ‘Effort estimation of FLOSS projects: a study of the Linux kernel’

More good news  :) . Paper accepted at the Empirical Software Engineering Journal. I’d like to personally thank Andrea Capiluppi for his invaluable work on this paper :) .

In this paper we tried to study the relationship between effort, the release effect (or deadline stress) when submitting changes, the time of the day and the experience of the developers (measured by means of the definition of being part of the core). It has been observed how specific changes during the late night hours tend to increase the complexity of the source code. In addition, there seems to be a release effect when comparing that complexity of the source code before and after a given release.

Abstract: Empirical research on Free/Libre/Open Source Software (FLOSS) has shown that developers tend to cluster around two main roles: “core” contributors differ from “peripheral” developers in terms of a larger number of responsibilities and a higher productivity pattern. A further, cross-cutting characterization of developers could be achieved by associating developers with “time slots”, and different patterns of activity and effort could be associated to such slots. Such analysis, if replicated, could be used not only to compare different FLOSS communities, and to evaluate their stability and maturity, but also to determine within projects, how the effort is distributed in a given period, and to estimate future needs with respect to key points in the software life-cycle (e.g., major releases). This study analyses the activity patterns within the Linux kernel project, at first focusing on the overall distribution of effort and activity within weeks and days; then, dividing each day into three 8-hour time slots, and focusing on effort and activity around major releases. Such analyses have the objective of evaluating effort, productivity and types of activity globally and around major releases. They enable a comparison of these releases and patterns of effort and activities with traditional software products and processes, and in turn, the identification of company-driven projects (i.e., working mainly during office hours) among FLOSS endeavors. The results of this research show that, overall, the effort within the Linux kernel community is constant (albeit at different levels) throughout the week, signalling the need of updated estimation models, different from those used in traditional 9am–5pm, Monday to Friday commercial companies. It also becomes evident that the activity before a release is vastly different from after a release, and that the changes show an increase in code complexity in specific time slots (notably in the late night hours), which will later require additional maintenance efforts.

 

Enjoy!

 

pd: more info at the paper website

 

Paper accepted at IJOSSP: ‘Are developers fixing their own bugs?’

Some good news: paper accepted at the International Journal of Open Source Software and Processes.

The point about this paper was to study if developers usually fix the bugs they are involuntary introducing in the source code.  Our initial intuition told us that this is happening, except in those projects where the regeneration of developers would be too high. However, in the case of the comm-central repository (partially Thunderbird from the Mozilla Foundation), the results showed that only in few cases (around a 5%) developers fix their own bugs. Surprising, isn’t it? :) .

Abstract: The process of fixing software bugs plays a key role in the maintenance activities of a software project. Ideally, code ownership and responsibility should be enforced among developers working on the same artifacts, so that those introducing buggy code could also contribute to its fix. However, especially in FLOSS projects, this mechanism is not clearly understood: in particular, it is not known whether those contributors fixing a bug are the same introducing and seeding it in the first place. This paper analyzes the comm-central FLOSS project, which hosts part of the Thunderbird, SeaMonkey, Lightning extensions and Sunbird projects from the Mozilla community. The analysis is focused at the level of lines of code and it uses the information stored in the source code management system. The results of this study show that in 80% of the cases, the bug-fixing activity involves source code modified by at most two developers. It also emerges that the developers fixing the bug are only responsible for 3.5% of the previous modifications to the lines affected; this implies that the other developers making changes to those lines could have made that fix. In most of the cases the bug fixing process in comm-central is not carried out by the same developers than those who seeded the buggy code.

 

More information can be found at the IJOSSP website.

Working on the “Evaluation of libre software projects” subject

It seems that Christmas is quite close, but before that, I have to teach the “Evaluation of Libre Software Projects” subject together with Pedro Coca at the Universidad Rey Juan Carlos :) .

So, my doubts were related to where to host the slides (in addition to the Master on Free Software Moodle and the point was to use again the Scribd website creating a new shelf there, so welcome to the Evaluation of FLOSS projects shelf :) .

Comprar libros digitales en España

Abro El País en su versión digital  y encuentro una noticia que dice: “Libros digitales desde 1,99 euros y sin protección anticopia“.

A lo que me pregunto si por fin se van a poner las pilas las editoriales con este tema en España. Para evitar leer el resto del post la respuesta es no: todo sigue igual.

Ejemplo práctico:

Entro en la página anunciada por El País: B de Books y veo uno cualquiera que me llama la atención de Vázquez Figueroa y se llama Coltan. Bien, pues B de Books lo anuncia a 3,99 euros y en teoría sin DRM. Sin embargo, si se pincha encima, me redirige a otra página de Todoebook. Y en esa página indica que el libro vale 7,5 euros y que además usa DRM.

Entonces me pregunto:

1- En primer lugar ¿por qué me redirige a otra página?. ¿No decían en el artículo de El País que se podrán comprar libros con menos “clics”?.

2- En segundo lugar, ¿por qué en su página dicen que cuesta 3,99 euros y donde realmente se compra cuesta 7,5 euros?.

3- Y finalmente, ¡lleva DRM!, volvemos a lo mismo de siempre y desde Linux supongo que tendré que usar Wine, etc, etc…

 

En definitiva, que no sé muy bien a qué juegan todas estas librerías, periódicos que anuncian cosas que no son y demás monserga si al final estamos como siempre. Dan ganas de comprarse un Kindle y que se lleve toda la pasta Amazon. Total, aquí no aprenden…

 

 

Are traditional publishers moving to Open Access?

Looking around the Empirical Software Engineering journal from Springer, I realized that there is an option about licensing the work using a libre license (let me use the term “libre”  as  free as in free beer and as in free speech, as usual). At least this is what I understand from the open access website of Springer (hopefully, I’m not missing anything :P ):

Anyone is free:

  • to copy, distribute, and display the work;
  • to make derivative works;
  • to make commercial use of the work.

In fact, in the current volume: December 2011, the last paper is accessible by everyone.

However, as far as I know, the open access journals usually ask for money to the authors since part of their business model is missed by the “openness” of the papers.

Does anyone know how this works in the case of Springer if I want to license my work under a creative commons  license?

Distributed source code management system patented?

Has been the Git / Mercurial / Bazaar idea patented?.

Some days ago I was surfing Google and Google scholar to look for academic references regarding to distributed software development. However, it was quite surprising to find a patent from the USA describing the way a distributed source code management system work.

Specifically, this patent was born in 1997 and can be found as a Google search.

After having a look at the document, it is still not clear if a piece of software such as Git or Mercurial are kind of “illegal” because of this patent in the USA…

Patents and the mobile market

Quite interesting the post in Google’s blog about their point of view of software patents and how the Android technology is being “attacked” by specific companies that are buying more and more patents.

A couple of sweets:

Patents were meant to encourage innovation, but lately they are being used as a weapon to stop it.

Unless we act, consumers could face rising costs for Android devices — and fewer choices for their next phone.

So, probably Google is exactly pointing to the problem: software patents and how they were supposed to be, but how they really  are.

Let’s think about how this may affect the business model of  SMEs which can not reply as Google, Samsung or HTC do…

Thanks to  Xataca for the pointer.

Educación en el siglo XXI

“Al entrar en el aula casi parece que el conocimiento sea algo escaso, difícil, prácticamente imposible de obtener a no ser que tengamos a un adulto debidamente cualificado en frente” – Curtis Johnson (La manera disruptiva de aprender – Redes)

Creo que esa frase resume básicamente lo que se encuentra uno en la actualidad en cualquier aula. El problema de fondo es: ¿cómo se puede mejorar?.

En el vídeo, altamente recomendable, se habla de que poco a poco, la educación irá cambiando hacia un modelo donde los estudiantes dejan de ser actores pasivos y pasan a ser actores totalmente activos. ¿De qué servirá que un profesor dé una fecha concreta para un acontecimiento importante como la Revolución Francesa, si ese mismo alumno es capaz de buscarlo por sí mismo en Internet o a través de una enciclopedia digital?. Al fin y al cabo, cualquier niño sabe buscar en Internet.

El rol del profesor por tanto, debería tender a cambiar hacia un rol  que ayude a  desarrollar en el alumno competencias específicas, como es el trabajo en grupo, empatía personal hacia situaciones de otras personas o situaciones similares y no a ser meramente un difusor de conocimientos.

El vídeo explica como  el uso de plataformas online permite acercar a aquellos que se encuentran lejos y que incluso permite tener una relación mucho más directa entre el profesor y el alumno ya que activamente se están requiriendo respuestas .

Aunque breve, creo que es una pequeña reflexión que toda aquella persona que actualmente esté trabajando de algún u otro modo de profesor debería hacerse. Cómo mejorar el sistema actual que esté usando para sacar el mayor provecho de cada una de las personas que están atendiendo. De hecho, no todas las personas aprenden de la misma manera, ni a la misma velocidad. Aunque esto último es un tema que debería tratarse por separado.

 

 

QualOSS: Interviewing Companies

This project started around four years ago. I just started in LibreSoft and this was my first project in terms of acquiring experience and working in an international environment. This was in somehow my starting point in real software engineering field, working with companies and academic partners and having real responsibilities.

And here we go: QualOSS basically started as this kind of project start: looking around and checking what could be useful. In fact, one of the main purposes of the project was to have a real industrial application, what implies start talking to companies.

Several companies were interviewed by the QualOSS consortium. Between two or three per partner and main questions were derived from the current state of the art during those days in software quality, but also trying to deal with the idea of open source in all the skeleton.

From my personal point of view, the main goal that we achieved with this approach was to realise that there were kind of three prototype of company:

  • First of all: Those companies which do not care open source. If this is useful for the company, good news, if not, bad luck.
  • Those companies which base their business model in open source.
  • And a mix of the two aforementioned points.

Deepening in the second point, I could say that those companies were pretty interested specifically in the licensing process and the community around a product. In other words: perhaps the product is a great product, but they want a set of good people supporting it in terms of community. Thus this was a key factor.

So, after all, we created a preliminary definition of the main quality attributes addressed in the project (Robustness and Evolvability) based on the interviews:

As we can see, the community quality was addressed as being part of the quality models by some companies, what means that open source does not mean just free software (as in free beer) for companies, but also some other factors are being raised as important enough to be studied.

More information can be found at deliverable 1.2 from the QualOSS consortium.