For labour intensive BioInformatics processing
July 20th 2001 / February 11, 2005
Prithwis Mukerjee, Ph.D.
India has become a favourite destination for multinational companies setting up back-offices for large volume data processing activities. This involves creating a world class computing infrastructure that is used by a large number of low cost resources to churn out a high volume of repetitive transactions based on data collected, electronically or physically, at high cost client locations. The people who perform these transactions are not highly skilled, but they work under the supervision of very experienced and knowledgeable supervisors and with the help of detailed instruction sheets and easy to use enabling technology. This model can be extended into the area of biological data processing.
Contemporary research in the area of life sciences has revealed the need for extensive usage of computers for ultra large volume data processing. The sequencing of the human and other genomes and the subsequent mapping of individual nucleotides sequences to specific genes and then to the corresponding
proteins are some of the more glamorous examples of these requirements. However there are other areas, for example in the area of processing of clinical field trial data that are equally important but not so well publicized.
Software tools for performing these tasks are very often already available or under development and refinement in more “advanced” research environments. However few, if at all any, of these tools allow a fully automated process and generate a complete and self-sufficient result. Instead, most of these tools, like BLAST, FASTA etc [in this case, from the world of genome sequencing …] do a part of the work and leave the result in a stage that needs some more human [ intelligent ? ] intervention. This human intervention calls for two kind of skills, first a low end “technician” grade skill to do some preliminary spade work – to weed out inconsequent and otherwise irrelevant results, followed by a high grade “deep” skill that results in a fundamental decision point. The closest analogy would be a pathological laboratory where the technician does the initial work that is then “signed off” by a qualified and licensed medical practitioner. The pathology laboratory is a “wet lab” for in-vivo or in-vitro activities, whereas the model that is presented here is a “dry lab” for in-silico activities.
The volume of work associated with many of the problems is extremely large and this is where India has an advantage. The “low end” work could be done in India by an army of adequately trained technicians under the tight supervision and guidance of a small number skilled biotechnology professionals. Considering the difference in salary levels between India and the industrialized countries, the cost advantage would be significant – in fact it would be on par with the corresponding advantages enjoyed by Indian software companies. The cost difference multiplied by the potential volume of work is the value proposition.
While the initial focus would be on low end “Y2K” style work [ to draw an analogy from the world of software] there is nothing that stops a successful Indian operation to build upon these skills and then climb up the value chain to enter the world of high end software development and biotech services.
There are four main categories of questions that need to be addressed before any further action :
• What are the various technical problems that we can solve ? Or what are the specific services that can be offered in this mode ? Some possible examples could be :
o Transformation of nucleotide sequence data into potential genes and then to amino acid sequences and finally proteins
o Structural genomics leading to validated target molecules and drugs
o Data mining / Statistical analysis / Annotation of data generated by companies engaged in population genomics and clinical field trials
Here we must keep in mind that the window of opportunity that presents itself can close with the progress in the area of automated tools. For example, the medical transcription market, that is being exploited for the past 10 years, may just disappear with the development of reliable speech recognition and speech-to-text technology.
• Is there a significant pool of “technician” grade people who have the skills or can be trained ?
Traditional back office operations have depended on the available pool of B.A, B.Sc., B.Com graduates who could be easily trained in the area of invoice processing and reconciliation. Can something similar be done with B.Sc., M.Sc. graduates who have a “good” background in Chemistry and BioChemistry ? Using the software analogy, we need NIIT/Aptech certified programmers, not general stream graduates or Computer Science graduates from IIT.
• Who would be the client for such services ? Would it be pharmaceutical companies ? Companies involved in population genomics ? Research laboratories ? What kind of budgets do they have and can they be persuaded to part with their data ? In short, what is the market for such services ?
• What kind of hardware, software, communication links and offshore methodology would be necessary to support such an operation ? Do we invest in this infrastructure or do we leverage global infrastructure by working “over the wire” ? Do we have the technical skills necessary to install, configure and if necessary build and/or tailor the tools that can deliver services on a reliable and timely basis ?
The first step would be to have a forum, address the four issues and evolve a broad consensus. If the outcome is positive, the next step would be to :
• Create a pilot project team consisting of biotechnology and computer professionals who would
o Identify a specific problem or service
o Create the solution delivery platform in terms of hardware and software
• In parallel, identify a set of target clients for such services
o Convince one client to participate in this service delivery model at sharply discounted prices