Google Summer of Code 2010 Proposal: Implementation of algorithm to infer gene duplications in BioRuby


Implementation of algorithm to infer gene duplications in BioRuby


Robert Kuo


This project will implement an algorithm to detect gene duplications in BioRuby described in Zmasek and Eddy, 2001, “A simple algorithm to infer gene duplication and speciation events on a gene tree”, Bioinformatics, 17, 821-828. The project will include full documentation, tests, and examples.


Name: Bob Kuo
(the following have been removed from here but they are in my official proposal to Google)
Email Address:
Mobile Phone:
IRC Handle: bubaflub

I am 24 and currently pursuing my masters and living in Champaign-Urbana, IL. My undergraduate degree was in Math and Computer Science from the University of Chicago at Illinois, with my coursework focusing on number and coding theory, numerical analysis, and algorithms. I am currently employed part-time as a web developer working in PHP and Ruby on Rails. I participated in last years’ Google Summer of Code by working with the Perl Foundation ( which was successfully completed ahead of schedule.

I am interested in BioRuby because I have always had an interest in combining the sciences with programming and am interested in learning more about evolutionary biology. I believe I am well-suited for this project not because I am the best Ruby programmer in the world, but because I am willing to learn and work through algorithms step-by-step. There is a reference specification written in Java from which I can test my Ruby project. My work as a web developer has required me to be multi-lingual and am comfortable reading and writing Java, Perl, PHP, and Ruby.

You can see my open source work at, and would recommend looking through–primality, my work from last years’ Google Summer of Code with The Perl Foundation which has significant in-line documentation and many tests.

All code developed for this project would be released on GitHub and available under the same terms as BioRuby itself.

Before the start date (April 20th):

  1. Meet the BioRuby and Open Bioinformatics community
  2. Familiarize myself with the BioRuby package and the phyloXML format
  3. Read necessary papers (such as
  4. Familiarize myself with the Java implementation of the algorithm (
  5. Discuss and set expectations of code – dependencies, code style, tests, documentation, etc.

First half (May 23rd to July 13th):

  1. Decide on which libraries
  2. Spec out all necessary components with documentation and failing unit tests
  3. Write integration tests that cover the entire algorithm (i.e. if I input “A” I should get “B”)
  4. Begin implementing the algorithm

Second half (July 13th to August 10th):

  1. Finish implementing the algorithm
  2. Finish documentation and tests

In the even that I finish early:

  1. Profiling and speeding up existing code
  2. Extra documentation and tests
  3. More examples
  4. Extend the algorithm to use non-binary species and gene trees.

I will continue to work part-time during the summer and may take summer classes. Last year these obligations were not problematic and did not affect my performance. During the day though I will be at work I will be available via email, IM, and IRC.
I welcome any feedback, comments, and critiques to this proposal.

Bob Kuo

Posted in Google Summer of Code | Comments Off on Google Summer of Code 2010 Proposal: Implementation of algorithm to infer gene duplications in BioRuby

Comments are closed.