They say that necessity is the mother of invention. Whoever 'they' are, it seems to be a true statement. I wanted to do something useful, creative, and well, publishable ( :) ) during my sabbatical. And so, when I was lucky enough to strike up a conversation with Tony Williams at the 246th ACS meeting in Indianapolis, I suggested that we work a project together. Then I had to come with something, and fast...
Being an analytical chemist turned cheminformatician made the scope of any proposed project at the same time limited and obvious. Tony and his team are working on the RSC's data repository, the next big project since ChemSpider (http://www.chemspider.com) and so, after some discussion, we came up with 'Data Standards for Representation and Annotation of Analysis Information'. This project proposal outlined the idea of developing ChAMP as the basis by which the RSC could text mine its archive of over 300,000 research papers for any reported analytical methodology.
I guess it goes without saying that the proposal was funded as you would not be reading this if it weren't. In the six months since then I have had many ideas that continue to reinforce that this idea is a really good one. I have been fortunate enough in my academic career to work on a number of projects that contribute to my perspective on the fundamental need for ChAMP such as:
- The Flow Analysis Database (http://www.fad.unf.edu)
- The Analytical Sciences Digital Library (http://www.asdlib.org)
- The Analytical Information Markup Language (AnIML) (http://animl.sourceforge.net)
- The Units Markup Language (UnitsML) (http://unitsml.nist.gov)
- JCAMP-DX (http://jcamp-dx.org/)
So, the scope of ChAMP is broad with many user communities, perspectives, and needs. The nice thing is this is not a standards project. Even though the original proposal has 'Standards' in the title (and in some way we are going to 'standardize' some things), the platform will not be a monolithic mandate to the masses about how to annotate a chemical analysis. It will be a platform that describes metadata elements, controlled vocabularies, ontologies, and datatype specifications such that:
- Standards can be built from the platform to fit the need of particular fields/application areas (e.g. pharmaceuticals, environmental)
- Educators can teach chemical analysis using defined vocabulary terms (and potentially create teaching material linked to those terms)
- Publishers/authors can annotate papers as they are accepted, making comparison to existing research easier
- Analysts can move toward consensus about the 'Minimal Amount of Information' needed to characterize different analytical techniques (see MIAME)
- Integration of chemical analysis information into the semantic web is easy
It is obvious that this project is going to take time, certainly longer than my sabbatical. So, my job is to gather data on what's been done, identify the needs of the community (help me!), write first drafts of metadata/vocabularies/ontologies/best practices, catalyze discussion around this topic and encourage participation. To me this sounds like fun and I am 'ChAMPing' at the bit - as it were. SJC.