Google Summer of Code 2010 Proposal: Implementation of algorithm to infer gene duplications in BioRuby

Title:

Implementation of algorithm to infer gene duplications in BioRuby

Student:

Robert Kuo

Abstract:

This project will implement an algorithm to detect gene duplications in BioRuby described in Zmasek and Eddy, 2001, “A simple algorithm to infer gene duplication and speciation events on a gene tree”, Bioinformatics, 17, 821-828. The project will include full documentation, tests, and examples.

Content:

Name: Bob Kuo
(the following have been removed from here but they are in my official proposal to Google)
Address:
Email Address:
Mobile Phone:
IRC Handle: bubaflub

I am 24 and currently pursuing my masters and living in Champaign-Urbana, IL. My undergraduate degree was in Math and Computer Science from the University of Chicago at Illinois, with my coursework focusing on number and coding theory, numerical analysis, and algorithms. I am currently employed part-time as a web developer working in PHP and Ruby on Rails. I participated in last years’ Google Summer of Code by working with the Perl Foundation (http://socghop.appspot.com/gsoc/student_project/show/google/gsoc2009/dukeleto/t124022226790) which was successfully completed ahead of schedule.

I am interested in BioRuby because I have always had an interest in combining the sciences with programming and am interested in learning more about evolutionary biology. I believe I am well-suited for this project not because I am the best Ruby programmer in the world, but because I am willing to learn and work through algorithms step-by-step. There is a reference specification written in Java from which I can test my Ruby project. My work as a web developer has required me to be multi-lingual and am comfortable reading and writing Java, Perl, PHP, and Ruby.

You can see my open source work at http://github.com/bubaflub, and would recommend looking through http://github.com/bubaflub/math–primality, my work from last years’ Google Summer of Code with The Perl Foundation which has significant in-line documentation and many tests.

All code developed for this project would be released on GitHub and available under the same terms as BioRuby itself.

Plan:
Before the start date (April 20th):

  1. Meet the BioRuby and Open Bioinformatics community
  2. Familiarize myself with the BioRuby package and the phyloXML format
  3. Read necessary papers (such as http://bioinformatics.oxfordjournals.org/cgi/content/abstract/17/9/821)
  4. Familiarize myself with the Java implementation of the algorithm (http://www.phylosoft.org/forester/applications/sdi/)
  5. Discuss and set expectations of code – dependencies, code style, tests, documentation, etc.

First half (May 23rd to July 13th):

  1. Decide on which libraries
  2. Spec out all necessary components with documentation and failing unit tests
  3. Write integration tests that cover the entire algorithm (i.e. if I input “A” I should get “B”)
  4. Begin implementing the algorithm

Second half (July 13th to August 10th):

  1. Finish implementing the algorithm
  2. Finish documentation and tests

In the even that I finish early:

  1. Profiling and speeding up existing code
  2. Extra documentation and tests
  3. More examples
  4. Extend the algorithm to use non-binary species and gene trees.

Obligations
I will continue to work part-time during the summer and may take summer classes. Last year these obligations were not problematic and did not affect my performance. During the day though I will be at work I will be available via email, IM, and IRC.
I welcome any feedback, comments, and critiques to this proposal.

Thanks,
Bob Kuo

Posted in Google Summer of Code | Comments Off on Google Summer of Code 2010 Proposal: Implementation of algorithm to infer gene duplications in BioRuby

Comments are closed.