April 23, 2017

Beautiful and unusual gift from PMI West Bengal

Yesterday, I had the good fortune to have been invited to speak at the PMI Regional conference where instead of the regular, and pointless, bouquet of flowers that is traditionally given to the keynote speaker, I was presented with the following certificate


what this means is that PMI has paid Sankalptaru.org some money to plant 10 trees on my behalf and "my tree" is visible in at the URL indicated by the QRcode.

Thank you PMI for this unusual gift

April 22, 2017

DB2 to Lotus : Accessing Mainframe Data from PC in the pre-Windows age


April 16, 2017

Spark with Python in Jupyter Notebook on Amazon EMR Cluster

In the previous post, we saw how to run a Spark - Python program in a Jupyter Notebook on a standalone EC2 instance on Amazon AWS, but the real interesting part would be to run the same program on genuine Spark Cluster consisting of one master and multiple slave machines.

The process is explained pretty well in Tom Zeng's blog post and we follow the same strategy here.

1. Install AWS Command Line services by following these instructions.
2. Configure the AWS CLI with your AWS credentials using these instructions.

in particular, the following is necessary
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE 
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-east-1
Default output format [None]: ENTER

you will have to use your own AWS Access Key ID and AWS Secret Access Key of course!

3. Execute the following command :

aws emr create-cluster --release-label emr-5.2.0 \
  --name 'Praxis - emr-5.2.0 sparklyr + jupyter cli example' \
  --applications Name=Hadoop Name=Spark Name=Tez Name=Ganglia Name=Presto \
  --ec2-attributes KeyName=pmapril2017,InstanceProfile=EMR_EC2_DefaultRole \
  --service-role EMR_DefaultRole \
  --instance-groups \
    InstanceGroupType=MASTER,InstanceCount=1,InstanceType=c3.4xlarge \
    InstanceGroupType=CORE,InstanceCount=2,InstanceType=c3.4xlarge \
  --region us-east-1 \
  --log-uri s3://yj01/emr-logs/ \
  --bootstrap-actions \
    Name='Install Jupyter notebook',Path="s3://aws-bigdata-blog/artifacts/aws-blog-emr-jupyter/install-jupyter-emr5.sh",Args=[--r,--julia,--toree,--torch,--ruby,--ds-packages,--ml-packages,--python-packages,'ggplot nilearn',--port,8880,--password,praxis,--jupyterhub,--jupyterhub-port,8001,--cached-install,--copy-samples]

note the options have been modified a little
a) number of machines is 1+2
b) the S3 bucket used is yj01 in s3://yj01/emr-logs/
c) the password is set as "praxis"
d) the directive to store notebooks on S3 has been removed as this is causing problems. Now the notebooks will be stored in the home directory of the user=hadoop on the master node

this command returns ( or something similar)
{
    "ClusterId": "j-2LW0S8SAX5OC4"
}

4. Log in to the AWS console and go to the EMR section.

The cluster will show up as starting

and will then move into Bootstrapping mode

and after about 22 minutes will move into Waiting mode. If that happens earlier then there could have been an error in the bootstrap process. Otherwise you will see this

5. Login to Jupyter hub
Note the URL of the Master Public DNS : ec2-54-82-207-124.compute-1.amazonaws.com
and point your browser to : http://ec2-54-82-207-124.compute-1.amazonaws.com:8001



Login with user = hadoop and password = praxis  ( supplied in the command) and you will get the familiar Notebook interface


There will be samples directory containing sample programs covering a wide range of technologies and data science applications. Extremely useful to cut-and-paste from!

Create a work directory and upload the Wordcount and the Hobbit.txt file, used in the original Spark+Python blog post

Notice the changes necessary for cluster operations


Cells 1 -3 reflect the fact that we are now using a cluster, not a local machine
Cells 4, 12 show that the program is NOT accessing the local file storage on the Master Node but the HDFS file system on the cluster

To explore the HDFS file system, go back to this screen

and then press "View All" ... Click on the HDFS link and take your browser to
http://ec2-54-82-207-124.compute-1.amazonaws.com:50070
and see

and you can browse to the hadoop user home HDFS directory where the "hobbit.txt" file was stored and where the "hobbit-out" directory has been created by the Spark program. In fact, all HDFS operations can be carried out from the Notebook cells like this

!hdfs dfs -put hobbit.txt /user/hadoop/
!hdfs dfs -get /user/hadoop/hobbit-out/part* .
!hdfs dfs -ls hobbit-out/
!hdfs dfs -rm hobbit-out/*
!hdfs dfs -rmr hobbit-out
!hdfs dfs -rm hobbit.txt

You can also see the various Hadoop resources -- including the two active nodes through this interface
After Jupyterhub is started, the notebooks can be accessed by going directly to port 8880 and using the password=praxis

Finally it is time to
6. Terminate the cluster!


Go to the cluster console, choose the active cluster and press the terminate button. If termination protection is in place, you would need to turn it off.



Notes :
1. The same task can be done through the EMR console, without having to use the AWS CLI command line because most of the parameters used in this command can be passed through the console GUI. For example, look at this page.
2. Because of the error with the S3 we are storing our programs and data in the master node where it gets deleted when the cluster is terminated.  Ideally this should be placed in an s3 bucket using this option --s3fs
3. The default security group created by the create-cluster command does not allow SSH into port 22. However if this is added, then standard SSH commands can be used to access and transfer files into the master
4. Tom Zeng's post says that SSH tunnelling is required. However I did not need to use process nor follow any of the complex FoxyProxy business to access. Not sure why. Simple access to port 8001 and 8880 worked fine -- Mystery?

Spark with Python in Jupyter Notebook on a single Amazon EC2 instance

In an earlier post I have explained how to run Python+Spark program with Jupyter on local machine and in a subsequent post, I will explain how the same can be done an AWS EMR cluster of multiple machines.
In this post, I explain how this can be done on a single EC2 machine instance running Ubuntu on Amazon AWS.

The strategy described in this blog post is based on strategies described in posts written by Jose Marcial Portilla and Chris Albon. We assume that you have a basic familiarity with AWS services like EC2 machines, S3 data storage and concept of keypairs and an account with Amazon AWS. You may use your Amazon eCommerce account but you may also create one on the AWS login page. This tutorial is based on Ubuntu and assumes that  you have a basic familiarity with the SSH command and other general Linux file operation commands.

1. Login to AWS

Go to the AWS console ,login with userID and password, then go to the page with EC2 services. Unless you have used AWS before, you should have 0 Instances, 0 keypairs, 0 security groups.

2. Create (or Launch) an EC2 instance and use default options except for
a. Choose Ubuntu Server 16.04 LTS
b. Instance type t2.small
c. Configure a security group - unless you already have a security group, create a new one. Call it pyspju00. Make sure that it has at least these three rules.
d. Review and Launch the instance. At this point you will be asked to use and existing keypair or create a new one. If you create a new one, then  you can will have to download a .pem file into your local machine and use this for all subsequent operations.

Go back to the EC2 instance console and you should see your instance running :


Press the button marked Connect and you will get the instructions on how to connect to the instance using SSH.

3. Connect to your instance

Open a terminal on Ubuntu, move to the directory where the pem file is stored and connect with

ssh -i "xxxxxxx.pem" ubuntu@ec2-54-89-196-90.compute-1.amazonaws.com
you will have a different URL for your instance

From now on you will be issuing commands to the remote EC2 machine

4. Install Python / Anaconda software on remote machine

sudo apt-get update
sudo apt-get install default-jre

wget https://repo.continuum.io/archive/Anaconda3-4.3.1-Linux-x86_64.sh

get the exact URL of the Anaconda download by visiting the download site and copying the download URL

bash Anaconda3-4.3.1-Linux-x86_64.sh
Accept all the default options except on this one, say YES here
Do you wish the installer to prepend the Anaconda3 install location
to PATH in your /home/ubuntu/.bashrc ? [yes|no]
[no] >>> yes

logout of the remote machine and login back again with
ssh -i "xxxxxxx.pem" ubuntu@ec2-54-89-196-90.compute-1.amazonaws.com

5. Install Jupyter Notebook on remote machine

a. Create certificates in directory called certs

mkdir certs
cd certs
sudo openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout pmcert.pem -out pmcert.pem

this creates a certificates file pmcert.pem ( not to be confused with the .pem file downloaded on your machine) and stores it on the remote machine

b. Jupyter configuration file

go back to home directory and execute
jupyter notebook --generate-config

now move to the .jupyter directory and edit the config file

vi jupyter_notebook_config.py
if you are not familiar with the editor, either learn how to use it or use anything else that you may be familiar with

notice that everything is commented out and rather than un-commenting specific lines, just add the following lines at the top of the file
#--------------------------------------------------------------------------------
c = get_config()

# Notebook config this is where you saved your pem cert
c.NotebookApp.certfile = u'/home/ubuntu/certs/pmcert.pem' 
# Run on all IP addresses of your instance
c.NotebookApp.ip = '*'
# Don't open browser by default
c.NotebookApp.open_browser = False  
# Fix port to 8888
c.NotebookApp.port = 8892
#--------------------------------------------------------------------------------

c. Start Jupyter without browser and on port 8892

move to new working directory
mkdir myWork
cd myWork
jupyter notebook

you will get >
Copy/paste this URL into your browser when you connect for the first time,    to login with a token:
        https://localhost:8892/?token=70b8623ec5ecf7d7d2f8b38b45112a92ec036ad3f5ed8a1d

but instead of going to local host, we will go to the EC2 machine URL in a separate browser window
https://ec2-54-89-196-90.compute-1.amazonaws.com:8892
this will throw errors about security but ignore the same and keep going until you reach this screen


in the password area, enter the value of the token that you have got in the previous step and you will see your familiar notebook screen


6. Installation of Spark

Go back to the home directory and download URL of the latest version of spark from this page.

wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz
tar -xvf spark-2.1.0-bin-hadoop2.7.tgz 
mv spark-2.1.0-bin-hadoop2.7 spark210

edit the file .profile and add the following lines at the bottom
-----------------------------------------------------
export SPARK_HOME=/home/ubuntu/spark210
export PATH=$SPARK_HOME/bin:$PATH
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.4-src.zip:$PYTHONPATH
export SPARK_LOCAL_IP=LOCALHOST
------------------------------------------------------
make sure that you have the correct version of py4j-n-nn-n-src by looking into the directory where it is stored

logout from the remote machine and then login back again

7. Running Spark 2.1 with Python

[The following step may not be necessary if your versions of Spark and Python are compatible. Please see the April 13 update on this blog for an explanation of this]

cd myWork
conda create -n py35 python=3.5 anaconda
logout / login (SSH) back
cd myWork
source activate py35

now run pyspark and note that pyspark is working with Python 3.5.2 so we are all set to start Jupyter again

jupyter notebook
note the new token=12e55cacf8cdcad2f8c77f7959047034b698f4b8f67b679a that you get

The Jupyter Notebook should now be clearly visible again at
https://ec2-54-89-196-90.compute-1.amazonaws.com:8892

Now we upload the notebook containing the WordCount program and the hobbit.txt input file, from the previous blog post.


That we can execute


This completes the exercise, but before  you go, do remember to shut down the notebook, logout of the remote machine and most important terminate the instance

8. Terminate the instance

Go to the EC2 Instance console and Terminate the instance. If  you do not do this, you will continue to be billed!



April 10, 2017

Raja Shashanka and the Calendar in Bengal

The origin of the Bengali calendar


On Saturday, 15th April 2017 Common Era, Bengalis in India, especially in West Bengal, Assam and Tripura, will celebrate the 1st Baisakh, or “Poila Boisakh”, 1424 Bengal Era (BE) -- the start of the Bengali New Year. Most of us are aware that the globally used Common Era, or Christian Era, starts with the birth of Jesus Christ in 1 CE, but what exactly is commemorated by the start of the Bengal Era? What happened in 1 BE?

There are two points of view.

Raja Shashanka is the first universally accepted ruler of a major part of the land mass that is associated with Bengal -- West Bengal & East Bengal / Bangladesh -- today. His capital was at Gaud (current Murshidabad) and he was a contemporary of Raja Harshavardhana of Kannauj (near Lucknow) in the West and of Raja Bhaskar Varman of Kamarupa (Assam) in the East. These three persons were the three principal rulers of North India. While exact dates are not available, it is strongly believed that Raja Shashanka ruled in Bengal between 590 CE and 625 CE. If we can assume that Raja Shashanka ascended the throne on 594 CE and Bengal celebrates the same as the start of the Bengali Era, then in 2017 CE, the Bengali Era year should be  1 + (2017 - 594) = 1424 BE which is exactly what it is on 2017 “Poila Baisakh”. Hence the Bengal Era begins with the ascendance of Raja Shashanka to the throne of Gauda-Bengal.

Long after Raja Shashanka and the Hindu rulers of Bengal were dead and gone, Bengal came under Islamic rule when Bakhtiar Khilji evicted Lakshman Sen in 1206 CE. Subsequently Bengal became a province under the Mughal (Mongol) empire that followed the Islamic Hijri calendar and because this was based on lunar months, it caused major administrative problems.

Agricultural revenue is tied to the harvest and it is most easily collected at the end of the harvest season when the farmer has money in his purse. Seasons in turn are tied to the position of the sun as defined by solar months that commence with the entry of the sun into the signs of the zodiac. So there is a one-to-one fixed connection between a solar month, the position of the sun and the seasons. For example, the spring or vernal equinox happens on 21 March of the Gregorian Calendar or on 1st Chaitra of the Saka Calendar, the official Government of India calendar, because both of these are solar calendars.

The Islamic Hijri calendar is based on lunar months where the start of each year varies widely across seasons -- in some years, the year starts in summer, in other years during the monsoon or in winter. So tax collection based on the Islamic year was a nightmare because the tax collector might arrive when the seeds had just been sown and the farmer would not have the money to pay his taxes. This would lead to endless arguments.

Akbar realised that the Hindu calendars, that were based on the solar months, were more useful for tax-collection purposes, because the year started on a fixed seasonal date. So he adopted the solar calendar, according to which the year of his coronation in 1556 CE was 1 + (1556 - 594) = 963 BE according to the Bengali Era. Coincidentally  -- and this was a huge coincidence -- 1556 CE was also 963 in the Islamic Calendar. So in order to not lose face by having to replace the unstable Islamic lunar calendar with the stable Hindu solar calendar, he adopted the Bengali solar calendar in 1556 CE, the year of his coronation but instead of defining it as Bengal Era year 1 BE, he declared it Bengal Year 963 BE, so as to maintain the illusion that he was continuing with the Islamic calendar. But going forward, the administrative year was aligned to the traditional Bengali solar year so that seasons will begin on fixed dates.

So, the first, simple, explanation for the Bengali Era is that it starts with the ascendancy of Raja Shashanka to the throne of Gaud in 594 CE with the first year being defined as 1 BE. The alternate explanation is that it starts with the coronation of Akbar in Delhi in 1556 CE but with the first year being numbered 963 BE to maintain an artificial equivalence with 963 Islamic Era year that was prevailing at that time.

Now it is up to the reader to decide whether he or she wants to start Bengali Era with the coronation of Raja Shashanka at Gaud, in 594 CE or Akbar at Delhi in 1556 CE.

Actually Akbar would have got away with this sleight of hand of passing off the Bengali Era as being an extension of the Islamic Era but for the start date. Akbar had his coronation on 14th Feb 1556 CE and if the Bengali Era was based on this event then the first day would have been 14th Feb. But all Bengalis celebrate the new year, 1st Baisakh, on 14th April, when the Sun enters the constellation of Aries or Mesha. This clearly shows that the Bengali Era is actually rooted in the Hindu tradition of Solar years dating back to Raja Shashanka and antiquity.

But how do we know which point in the sky is the start of Aries? Where does the zodiac start?


The two zodiacs : Tropical and Sidereal


In Bengal, we have 1st Baisakh usually coinciding with 15th April, when the Sun enters the constellation of Mesha (Aries). The Government of India approved Indian National Calendar that is based on the Saka Era defines the start of the year as 21st March which is 1st Chaitra, when the Sun enters the constellation of Meen (Pisces). Now this leads to a strange inconsistency. If the Sun enters Mesha (Aries) on 15 April as per the Bengali calendar, then it must enter Meena (Pisces) on 15/16th March, but as per the Saka calendar, it enters Meena (Pisces) on 21st March. Why this gap?

To explain  this anomaly, we need to know that there are TWO zodiacs, the tropical (ayana) zodiac and the sidereal (nirayana) zodiac and the implications of this is explored in the rest of this post. [ Warning : Rest of the post has a little mathematics, that you may like read only if you are not scared about the devil in the detail ]

Consider a spherical coordinate system, that is embedded on the Earth and rotates along with it every day. In this spherical coordinate system, every heavenly body, is defined by three numbers -- the azimuthal angle, that shows the position along the equatorial circle or on a longitude, the declination angle, that shows the position above or below the equatorial plane, and a distance from the centre of the Earth. In our assumption, all heavenly bodies are at the same uniform distance and fixed on “the sphere of heavens” and so the distance from the centre is immaterial. The only real variables are the azimuthal and the declination angles and they specify position of every heavenly body.

There are two classes of heavenly bodies -- the “fixed” stars and the “wanderers” or “planets”. The “fixed” stars do not change their position in our spherical coordinate system, but the “planets”, that also include the Sun, the Moon, move around among the “fixed” stars as their azimuth and declination angles change with the passage of time.

For the purpose of the solar calendar, we will only consider the movement of the Sun as it travels around the Earth. Do note that there is nothing mathematically wrong in considering the Sun to be travelling round the Earth, as frames of reference can be changed without affecting the description of the physical reality. As the Sun moves round the Earth, its azimuth angle, or longitude, changes from 0 through 359 then back to 0 in one year and in the same time its declination angle changes from -23 to +23 as seasons change from winter through spring, summer, autumn and back to winter. The declination being 0 at the two equinoxes, when day and night is of equal length. So the Sun moves in a band around the Earth and this band is divided into twelve sectors of 30 degrees each. Each of these twelve sectors are occupied by, or related to, one of the 12 constellations consisting of the “fixed” stars arranged in certain imaginary patterns -- Aries, Taurus, Gemini and so on.

A circle has neither a beginning nor an end and so while the Sun takes a year to complete this circle, there is no unambiguous way to define where exactly the circle -- and hence, by extension,  the year -- starts. However, this starting point can be defined in two ways leading to the existence of  two zodiacs - the tropical and the sidereal.

In the tropical zodiac, the point on the circle when the Sun is at the vernal equinox, and its declination is 0, is defined as the starting point of the year and the azimuth angle is defined as 0. This means that the tropical year starts at the vernal equinox and this is traditionally associated with the entry of the Sun into the sign of tropical Aries -- that is Aries as shown in the tropical zodiac.

In the sidereal zodiac, the point on the circle that is diametrically opposite to the “fixed” star Spica -- also known as Chitra in India -- is considered to be the starting point of the year, where the azimuthal angle is defined as 0. This means that the sidereal year starts when the Sun is opposite Spica and this is traditionally associated with the entry of the Sun into the sign of the sidereal Aries -- that is Aries as shown in the sidereal zodiac ( we will refer to this as sidereal Mesha, to avoid confusion with the tropical Aries, even though Aries and Mesha refer to the same physical constellation)

So now we have two circles, or two zodiacs, with two starting points and these two starting points are approximately 23 degrees separated from each other! This gap is known as ayanamsa and it keeps changing, increasing, with each passing year.
Open Link in New Tab to see the full diagram

The sidereal year, the time taken by the Sun to move to start from the “fixed” star Spica and return to it is 365.25636 days or rotations of the Earth. The tropical year, the time taken by the Sun to move from its position in one vernal equinox to its position in the subsequent vernal equinox is 365.242189 days or about 20 mins 24 sec less. The difference is because the axis of the Earth is not invariant and “wobbles” slowly. This means that the tropical year is shorter than the sidereal year by about 20 mins 24 secs

At some point in the past, in 285 AD, the position of the Sun at the vernal equinox was directly opposite the “fixed” star Spica. This means the the entry point of the tropical Aries was coincident on the entry point of the sidereal Mesha. In that year, the tropical and sidereal zodiacs were identical. But since the tropical year was shorter than the sidereal year, the next tropical year started 20 min 24 sec earlier than the next sidereal year. With each passing year, the tropical year commenced and additional 20 min 24 sec earlier until the cumulative gap between the respective starts of the tropical year and the sidereal year stands at almost 24 days in 2017 today.

But since all official solar calendars, including the Gregorian calendar used in the West and the Saka calendar officially used by the Government of India, are tied to the tropical calendar, the vernal equinox is fixed compulsorily on 21st March / 1st (tropical) Chaitra, when the Sun enter the tropical Aries. But the Bengali calendar, that starts on 1st (sidereal) Baisakh, when the Sun enters the sidereal Mesha begins on 15th April of the Gregorian calendar or 24 days later. The existence of two zodiacs, the tropical and the sidereal is the reason for the gap of 5 days that was the starting point for this discussion.

In 285 AD, when the tropical and sidereal zodiacs were coincident, the vernal equinox, the entry of Sun into tropical Aries and its entry into sidereal Mesha -- all three events -- would all have happened on 21st March  which would also have coincided with 1st (sidereal) Baisakh.

If we keep the date of the vernal equinox compulsorily fixed at 21st March, then with the passage of time, the start of the sidereal year will occur at a later date every year. Conversely, if the start of the sidereal year is considered to be fixed by the arrival of the Sun opposite Spica, and its entry into sidereal Mesha, then the vernal equinox will be 20 mins 24 secs “earlier” each year, when the Sun has not yet reached sidereal Mesha, but is still in sidereal Meena. From this sidereal perspective, the vernal equinox that signals the start of the tropical year with the entry of the Sun into tropical Aries, has now pushed “back” from sidereal Mesha and into sidereal Meena ( or Pisces). Hence, as per western astrological practices, this is the Age of Pisces and after some more time we will move even further backward into the Age of Aquarius.

In Hindu astrology, the analysis of the horoscope is based on the positions of the planets in the sidereal zodiac. However all astronomical calculations that are used to generate the ephemeris, the azimuthal or longitudinal positions of planets, are based on the tropical zodiac. Since the sidereal zodiac is about 23 degrees ahead of the tropical zodiac at the moment, all planetary longitudes need to be reduced by this amount -- known as the Ayanamsa amount -- before being shown on the horoscope. Western astrologers on the other hand work with the tropical and do not need this correction.

Finally, the identification of 285 AD as the year when the vernal equinox coincided with Spica and the tropical and sidereal zodiacs were identical has been challenged. While this date and the ayanamsa of 23 degrees has been defined by N C Lahiri, other astrologers claim that according to Surya Siddhanta,, the definitive classical text on astronomy, the year of coincidence should be 499 AD and the ayanamsa should be reduced accordingly. This is a big debate with no clear resolution in sight.

April 06, 2017

The Entangled Future of Man & Machine

First we had computers, then we had the world wide web and now we are talking about IOT, the Internet of Things. Computers are now pervasive and have occupied every nook and corner of life. Artificial intelligence seems to be growing bigger, faster and smarter everyday. Science fiction writers have often spoken of machines taking over the world but is that pure fantasy or is it a matter of time before this fiction becomes synonymous with fact? In this article, we will explore the genesis of artificial intelligence, examine how fast these are mutating and morphing into more advanced levels and finally speculate on what this means in terms of employment and other key indices of human society. But first, let us see ..

How machines learn

When I was a member of the faculty at IIT Kharagpur, one of my senior colleagues had told me that we should not to teach students but instead, help them learn. Can this idea to extended to machines?

Any spreadsheet user would know how to have a computer add two numbers. Why just add? You could do many other tasks like finding percentage or net present value as well. Obviously someone has written a computer program that does all this! Those who know computer programming would also know that writing the code to add two numbers is quite easy. Depending on the level at which you would like to describe the process, the program can be written in a high level language like Python or Java, with just one line of code. But if one uses a low level language like Assembler or binary code then he will need a large number of rather arcane instructions written as a series of 1s and 0s [http://bit.ly/add2numbers]. Irrespective of the programming language used, a program that performs any task on a computer, from adding numbers to showing a YouTube movie, is a series of explicit instructions given by a human programmer. Irrespective of the complexity of the task, it is always a human who teaches a computer how to perform it. This is one of the fundamental tenets of computer programming … or was, until the emergence of machine learning. Instead of teaching a computer, can we make it learn on its own?

There are many facts that a child is taught in school --  like how to add two numbers or when Raja Harshvardhan ruled in Kannauj. But there is much more that he learns on his own -- like recognising his mother or realising that toffees are good to eat but stones are not. How people learn, for example, to recognise a friend in crowd or choose the best move in a game of chess, is something that has baffled computer scientists for a long time. That is why even though we have had computer programs doing incredibly difficult things -- like landing  a spacecraft on Mars -- they have had immense difficulty in performing apparently simple tasks like crossing a busy city road. This has now triggered a new line of thought that, instead of teaching computers how to perform, we need to equip computers with the ability to learn how to perform!

While a lot of human learning involves memorising facts, the really complex or intelligent skills that people acquire are based on a series of trials and errors that they make as soon as they become aware of their environment. Whether babies or laboratory rats, intelligence is acquired by performing a task and determining whether the outcome was good or bad -- was there a reward or a punishment?. This conscious feedback loop spread over days or even years gives man the ability to do intelligent tasks like recognising a friend in a crowd even if he has grown a beard or lost his hair. After despairing for years about what kind of instructions that must be given to a computer to do similar tasks, scientists have looked into the human brain -- the biological marvel that sits inside the cranium -- for ideas and have finally hit upon a way to address these challenges.

The animal brain is an electrochemical device that consists of, literally, trillions of cells called neurons that can sense, generate and transmit electrical signals. Each neuron can be visualised as a little blob with a number of wires sticking out of it. Most of these wires, called dendrites, can sense an electrical symbol, while one, called the axon, can generate an electrical signal depending on the signals that it has received on its dendrites. The signal generated on the axon of one neuron feeds into the dendrites of other neighbouring or distant neurons, creating a complex electrical circuit that is constantly sending electrical charges rushing around the brain. Neurologists have a rough idea of what kind of electrical activity is associated with the corresponding human behaviour and it is believed that memory is a created or defined by the way these trillions of neurons are connected to and influence each other.

Computer scientists have merged these two mechanisms -- the reward-and-punishment mechanism from behavioural science and the input-output electrical mechanism from neurology -- to create a software program that mimics the behaviour of biological neurons. Such a software program is called an artificial neural network (ANN), and is the basis for almost all intelligent software including AlphaGo -- that can beat humans in board games, Watson -- that can diagnose clinical diseases, or the Google Self Driving car -- that has travelled thousands of miles with an accident rate that is far less than that of humans.

While we already know that a computer can be taught to add two numbers, let us now see how it can taught to learn to add on its own! To build a general purpose ANN one starts by simulating nodes, or artificial neurons, each with its own set of inputs and an output. Each input is associated with a parameter, or number, called weight and each output depends on the inputs and another parameter called bias. Output from each node is sent to other nodes to mimic electrical signals on the biological neural network. The number of nodes along with the numerical values of the respective parameters define the ANN. When the ANN receives an input it generates an output that depends on the specific values of weight and bias parameters. If the values are chosen at random, the output is wrong but with the right values, the network will generate the correct answer. Learning consists in determining the right values. The beauty of this approach is that almost all problems can be addressed with this structure -- only the number of nodes, weights and biases need to change.

To  teach a computer to learn how to add we provide it with training data, consisting of thousands of addition problems along with their correct answer! Initially, the ANN has random values of parameters and as it checks with the given correct answer and determines the error, it keeps changing the parameter values until, after thousands of attempts, the error becomes very small. Thus the machine discovers, or arrives at,  the correct parameter and can be said to have learnt addition. Now it can solve new addition problems correctly even though no human programmer gave it explicit instructions to add or even the right parameter values. The programmer’s skill lies in specifying how the parameters should be changed so that the error is minimised with the least number of trials. What is amazing is that a similar way of changing parameters can help the machine solve many other very different problems.

This is meta learning. The machine has learnt to learn and can now address many other hard problems like handwriting recognition, face recognition, detecting criminal behaviour like financial fraud and computer hacking, playing chess and other board games, medical diagnosis, weather forecasting, road navigation and even reading human thoughts! In all such cases, the program has to cycle through thousands of pre-solved problems until it discovers the values of the parameter that makes it generate correct answers. Sometimes this training process is evident -- as when you give it a large number of problems and their correct answers --  but sometimes the training is implicit.

When you “tag” a picture of your friend in Facebook, you are inadvertently giving Facebook a correct answer to a face recognition problem that helps train its software. Similarly, when you click on link in Google search, you identifying the most relevant answer and so Google is learning about your personal preferences so as to answer your next query better! In general, any action that you take on Facebook, a like, an emoticon, a phrase that you type as a post or a comment, is being used by the ANN to learn more about you so that it can predict your next behaviour -- which could be anything from accepting friend requests to clicking on ads!

Computers have been used for a long time to perform mathematical, process automation  and other well structured tasks. But despite great increases in the speed and memory, tasks that are ambiguous or unstructured were never addressed adequately. Many people claimed that intelligence -- something evident in children or even animals -- is something that mere machines can never acquire. ANNs, that can learn, can be trained,  has now helped us solve unstructured problems that were once considered intractable.

But unlike humans, machines can learn very fast -- working out 100,000 addition problems is almost impossible for a man but is trivial for a machine. So is the case with reading through all the articles in Wikipedia. If learning is the key to intelligence, as is commonly understood here, and if machines can learn much faster than humans, then does it mean that machines will become more intelligent than humans?

Tipping into the Singularity


In his 2000 bestseller, Malcolm Gladwell, defined the Tipping Point, as "the moment of critical mass, the threshold, the boiling point”, at which a small change, a sudden trigger, can suddenly usher in a huge change in the larger society. The phrase tipping point originates in the study of epidemics, and refers to the moment when a virus reaches a certain critical mass and then suddenly begins to spread at an accelerated rate. While Gladwell restricts himself to the analysis of social trends like crime and fashion, Ray Kurzweil, in a series of books on the technological singularity, predicts that human society as a whole is just about ready to transform itself in a way that is perhaps inconceivable today.

The trigger in this case is artificial intelligence or its alter ego, machine learning. The hypothesis is that society will pass through this technical singularity when non-biological intelligence transcends biological intelligence and changes the way not just how we live and locomote, but the way we think and what we believe. But is this hypothesis sustainable?  Or is this another big hype  -- that has been with us since the dawn of the science fiction era?

Ten years ago, Deep Blue, IBM’s chess playing computer had created history by beating the world champion, Gary Kasparov in 1996. Last year, AlphaGo, a computer based on an artificial neural network (ANN) , built by DeepMind, a Google company, that had “learnt” to play GO a game that is considered to be far more difficult for computers,  had beaten Lee Sedol, one of the highest ranked professional players in the world. In the first week of January 2017, AlphaGo, playing anonymously under the handle of “Master” ran through China’s online GO playing websites and beat almost every top ranked player with contemptuous ease. Playing with inhuman speed, it eventually ended its unbroken 60-0 winning streak by beating Ke Jei, the reigning world champion. More than the defeat itself, its style, the strategies that it evolved on its own, are so different from that used by humans that the world champions were awestruck. After losing to Master, Ke, the current world champion admitted on social media : “After humanity spent thousands of years improving our tactics, computers tell us that humans are completely wrong, (and) I would go as far as to say that not a single human has touched the edge of truth of Go.” Playing in a manner completely different from humans, it bewildered opponents with apparently foolish moves that placed pieces at outrageously unconventional positions so that one player, Gu Li noted that “I can’t help ask, one day many years later, when you find your previous awareness, cognition and choices were all wrong, will you keep going along the wrong path or reject yourself.”

The key elements of the computer program were not written by humans but were discovered by the program itself as it learnt to play Go by observing millions online games, as explained earlier.

Very similar is the case of Interlingua, a new language that has evolved, or emerged out of Google’s automatic language translation software. After being trained to translate say, French to English and then English to Hindi, the software has learned how to translate from French to Hindi even though it was never “trained” to do so. This new language, or new way to represent ideas, has emerged not from the mind of programmers but from the neural network architecture that drives the software. This is an emergent phenomenon, similar to the appearance of language in a primordial human society.

Since its uncertain beginning in the 1960’s artificial intelligence technology took nearly 40 years to reach the level of maturity to beat a human chess champion, but the next big leap, when it learnt to play Go took only 10 years. This is the law of accelerating returns -- where key jumps in human or biological ability happen at increasingly shorter intervals, or where the gap between between innovations shorten exponentially as shown by Ray Kurzweil :


[ Graph #1 source Wikimedia Commons Ray Kurzweil]

This graph shows the time gap between key events in the history of human social evolution plotted against the time frame of history and it shows a nice linear slope because this is log-log graph where we have taken the logarithm of the X and Y values in the plot. Similar graphs drawn with key events identified by others show a very similar downward trend. [ for example this ]
However, if we convert the historical time frame to a linear scale, then drop in the time between two successive key events is clearly precipitous and this is what leads us to the concept of technological singularity.

Those who are familiar with Moore’s Law would know that the density of transistors on computer chips has been doubling every two years -- with a corresponding fall in prices and increase in computation power. This exponential pace of technology has been the driver behind the inexorable and astonishing growth of computers. While Moore’s law operates on computer chips and has been valid for the past 50 years, these charts show that human innovation has been accelerating exponentially across an even bigger time scale. Each human invention, like the ability to walk upright, usage of tools, language, agriculture, writing, printing, industrial machinery, electricity, computers, internet -- have actually reduced the time necessary to reach the next key invention. For example, years of calculations done by Kepler to track the movements of planets can now be done by school student on his desktop and with Google search, it is far easier, for someone to tap into the knowledge of others and create a new piece of technology than what it would have taken 100 years ago.

In another view of this same chart, we should see an exponential -- almost magical --  growth in the human ability to address problems that were thought impossible. After millions of years of being bound to the ground, man learnt to fly and then in 60 years he was on the moon and is today planning to go to Mars. So is the case with industrial machinery, telecommunications, healthcare and a host of allied fields. We are at, or very close to, an inflexion point -- a tipping point -- that will put us on a trajectory that leads to what seem to be a world of magic that may include immortality, omniscience or anything that does not violate the fundamental laws of physics!

A key characteristic of the tipping point would be the blurring of the borders between the biological and the non-biological world. Just as the industrial revolution merged the ability of man and machinery to create the superhuman capability to manage enormous quantities of materials and energy with, for example, cranes, excavators, rocket engines -- so does the digital revolution merge the ability of humans and computers to create a similar superhuman capability that manages equally enormous quantities of information. Continuing with this analogy we see that just as industrial machines today are far more muscular than humans in their ability to lift loads or travel faster and farther, so would the next generation of intelligent machines, powered by artificial super intelligence, would far outstrip the mental ability of humans -- as demonstrated by AlphaGo.

This then is the singularity, that point in time in human history that human society will soon pass through, after which the intelligence of machines will surpass that of humans. This superior intelligence, coupled with the ability to handle gigantic quantities of information at superhuman speed will lead to a cascading impact on the emergence of new ideas that “will abruptly trigger runaway technological growth, resulting in unfathomable changes to human civilization.”

This explosion of intelligence, this chain reaction where each level of intelligence gives rise to an even higher level of intelligence, exhibited either by machines or in a hybrid cyborg of man and machine, at faster and faster speeds or at shorter and shorter intervals was first postulated by John Von Neumann, a pioneer of, among things, the digital computer. Von Neumann had coined the term singularity but never lived to see technology reach anywhere near the level that matched his postulates. Subsequently, IJ Good, Vernor Vinge and eventually Ray Kurzweil have elaborated the concept. However equally well known people like Paul Allen, who co-founded Microsoft and Gordon Moore, whose Moore’s Law is a classic example of this exponential progress, have raised doubts about the plausibility of this concept. But with the rapid growth of artificial intelligence as evident in the spectacular success of AlphaGo, it seems that the singularity is indeed near and despite a wide range of predicted dates, the median value of its estimated arrival is the year 2040.

Once we pass through the singularity, how will the world look after 2040? Would this non-biological intelligence help create new, biological, non-biological or hybrid “life-forms”? Would these life-forms spread out across the solar system and the galaxy? But before we look at such big question, let us explore something that is of immediate concern -- the job market.

The Future of Employment

When computers were first introduced in India, there was a lot of concern that this would lead to widespread unemployment. In West Bengal, and elsewhere, Communist labour “leaders” led huge agitations against this new technology with the slogan : automation must be stopped with rivers of blood, and employees of banks and commercial establishments went on strikes to prevent the installation of computers. But in reality the introduction of computers did not lead to any major disaster. The number of jobs lost to computers and automation was more than offset by the number of new jobs created not only in the IT industry, but also in many new age businesses that were based on computer technology. The number of people needed to  use computers or write programs for computers was more than the number of people who became redundant. So while certain individuals were retired or laid off, the overall number of people who were gainfully employed increased and the economy responded with buoyant optimism.

That was then, what will it be now? Will the new technology based on powerful instances of artificial intelligence create more jobs than it destroys? Can our earlier experience with an earlier generation of technology be extrapolated in time, through the technological singularity where machines become more intelligent than men? Unfortunately, the answer to both questions seems to be “No”.

While robots are not yet in widespread use in India, we can look at what is happening in other countries that are ahead of us in this path. Between 2000 and 2010, 5.6 million jobs were lost in the US and Canada. Of this only 15%  were lost to overseas competitors -- mainly China, while the other 85% were due to, what is euphemistically referred to as, “productivity growth”. This means that humans were replaced by machines or robots. However this decline in employment did not result in lower production. On the contrary, in the last 20 years, the value added by US factories has, after adjusting for inflation, grown by nearly 40% to reach a record US$ 2.4 trillion. On the street, McDonalds’ introduction of self-service kiosks in response to popular pressure for $15/hour wage is a vivid example.This is job-less growth, where the economy expands but without a corresponding increase in employment and is perhaps one reason why lots of middle-class Americans who have been rendered unemployed and unemployable in the rust belt states have aggressively turned against the establishment and voted for Donald Trump.

Many Trump supporters believe that foreign countries are stealing their jobs but the situation is not greatly different there either. The BBC has reported that Foxconn, the Chinese company that manufactures products on contract from Apple and Samsung, has recently replaced 60,000 workers with robots and many other companies in the Kunshan region, where Foxconn factories are located, are likely to follow suit. Since 2013, companies in China have purchased more industrial robots than in any other country, as thousands of companies are turning to automation in a robot-driven automation drive that has been backed by the Chinese government in a desperate bid to remain competitive in manufacturing. Can the “Make-in-India” movement avoid this? Unlikely, because no army can stop an idea whose time has come!

The services industry, that is of more immediate concern to India is no better off. Thanks to better voice and natural language recognition techniques, call center operators -- the backbone of India’s BPO / ITES success story -- can be replaced by artificial intelligence driven bots that can do a far better job in patiently listening to customer problems and offering solutions. In fact, a whole range of service jobs are now at risk and these include but are not limited to cooks and chefs, medical doctors, surgeons, pharmacists and pathology laboratory staff, security guards, retail salespersons, receptionists, bar-tenders, farmers, truck, bus and taxi drivers and even those who perform unstructured tasks like journalists, accountants and insurance claims adjusters. This may sound too futuristic and science-fictionesque but as we have seen earlier, the fierce of acceleration of change makes it impossible to wish away these dire, Cassandra-like predictions on the future of employment. Amazon Go, a new kind of store that has no human employees is an example of what a typical retail store could be like in the future. It’s Just-Walk-Out technology, allows a customer to walk in, pick up products from the shelves, look at it, return it if necessary, then simply walk out without any formal check out procedure and yet be billed for it automatically on his credit card. This is not magic, this is not the future. This is here and now. Such a store is actually operating in Seattle on a trial basis for Amazon employees and it is simply a matter of time before it becomes a mainstream technology, just like Google’s technology for driverless cars is proliferating across the US and Europe and is about to be tested even by Tata Elxsi in Bengaluru.

When bank workers were replaced by banking software there was a big need for computer programmers to build and maintain banking software. When retail stores were replaced by e-commerce sites they needed people to build and maintain e-commerce software and also an army of delivery boys to cater to an expanding clientele in the previously unreachable mofussil areas. But when jobs are lost to robots and artificial intelligence systems, the number of jobs created is only a fraction of those lost. Unlike computer software, industrial robots are not built by humans but by other robots that are designed and  programmed by a couple of very smart humans. So a large number of low end jobs are replaced by a few, high end, specialist jobs. This makes eminent economic sense for both the users and manufacturers of robots but leaves the newly unemployed worker distressed and angry.

While robots can always build robots, it was thought that at least the programming of such automated systems would be done by a few highly skilled humans -- but even such relief may be short lived. In January 2017, the MIT Technology Review has reported that Google is building systems that not only demonstrate AI in, say driving cars, but -- like a snake, recursively, swallowing its tail -- actually builds the the next-generation systems that demonstrate more AI. This is the eureka moment, comparable to the event of the DNA molecule being able to replicate itself and thus define the emergence of physical life. As AI systems build more advanced AI systems, they can take on a life, a non-biological life, of their own. So the need for human beings becomes even more insignificant and inconsequential in the economic systems of tomorrow’s world.

But then what  do we do with humans who are no more necessary for the economy to generate goods and services? for whom would the “economy” generate goods and services? and who will pay for these goods and services? These are difficult questions that will crop up as humanity moves into an uncharted territory of the post-singularity era. An obvious way to contain the rising tide of resentment against unemployment would be to have strong social security systems like Uniform Basic Income (UBI) or even NREGA that will push money into people’s pockets without them having to do any work -- because there really is no useful work left that they could do.

Does this mean that people will simply stay at home and play cards? Or will they get drunk and create mischief? An idle mind could be the devil’s workshop. What kind of sociological and psychological problems will this lead us towards? Once again we have questions but hardly have any definitive answers. In the utopian scenario, we envisage that man will  increasingly be involved in cerebral, cultural and entertainment activities -- art, music, literature, physical sports, cinema, virtual and augmented reality games, sex -- while machines, robots and AI systems will do the “dirty” job of keeping the economy running so that it generates a distributable surplus.  The other, dystopian scenario would be a descent into anarchy as the gap between the unemployable poor and the talented rich becomes bigger and more bitter. We are looking at gated communities, or huge walled towns, well endowed with sustainable sources of food and energy, managed efficiently with high technology and defended with highly effective robotic systems against an angry, anarchic and violent outer world ruled by rogues and brigands.

These two extremes that we can atavistically think of are reminiscent of the middle ages -- the first, benevolent scenario resembles a society of serfs and noblemen while the second, malevolent scenario reminds us of medieval cities being islands of civilisation and governance in an otherwise anarchic and lawless countryside. But perhaps with the emergence and eventual predominance of artificial intelligence and non-biological “life” forms, our existing models of social behaviour will cease to be reliable predictors of  hybrid, human-machine civilisation that we cannot envisage at the moment but are about to bequeath to ourselves anyway.

The accelerating pace of technology cries “Havok” and has let slip the dogs of rapid and irreversible change. The genie is out of the bottle and new “life” forms, with a different type of intelligence, is about to be released first into the economy, then into society and finally into the planet as a whole. Would this wipe out humans? Or would man evolve and adapt itself to co-exist with this new “species”? Would he retain his position as the master at the top of the global ecosystem? Or would this position be taken over by a machine or a hybrid cyborg that combines the best -- or worst -- of both man and machines? The answer lies in the womb of futurity.


This article originally appeared in Swarajya, the magazine that reads India right

March 25, 2017

Time Travel

A historical perspective

“What is time?” Asks James Gleick, in this history of Time Travel. “We know that it is imperceptible. It is immaterial. We cannot see it, hear it or touch it. Time is what clocks measure. But what is a clock? An instrument for the measurement of time. The snake swallows its tail again!” This is the kind of circular logic that the author tries to break out of in this engaging foray into one of the most mysterious concepts that has intrigued man since the nineteenth century.

Scholarly journal papers that announce new breakthroughs invariably begin with a review of past literature. Once in while, the literature review becomes bigger than any new concept that is being announced and in extreme cases, we end up with a what is known as review paper that merely surveys the subject without offering anything new. So is the case with this book. Rather than offering any new insight or even a clear exposition of any specific point of view, the author leads us through a grand tour of the various perspectives that scientists, philosophers and literary personas have explored in their respective efforts to put a structure around this most intriguing yet elusive idea of time. Given the breadth of subjects addressed, the depth is limited, but at least it creates a map of the terrain that the reader can explore on his own. This is the true value of the book under review.

Time as a matter of discussion entered the public domain with the HG Wells classic, The Time Machine, that set the tone for a whole genre. A science fiction story set in a different era -- complete with gadgets and behaviours that are dramatically different from what the author and his readers are accustomed to -- is one way travelling into the future, or the past. But the real flavour of time travel is revealed when the protagonists move forward and backward in time, into other eras or epochs. Such travel, creates contradictions, like a man meeting his own self in the past or the future or murdering his own father and negating his own existence, that form the backbone of many interesting novels that are discussed in this book.

Authors writing about time travel usually drift into philosophical discourses on the nature of time. Is it like a river? And if it is, is the observer standing on the bank or on a boat floating along with the river? Is it just the flexibility of the English language that allows us to save time, to spend time or even to waste time or do these verbs connect with certain real properties of time? These are questions that appear again and again but answers remain elusive to the original authors, the current author and certainly to the reader. In fact the author admits : “I doubt any phenomenon .. has inspired more perplexing, convoluted and ultimately futile philosophical analysis that time travel has.”

The book becomes more interesting when it eventually moves into science. The publication of The Time Machine by HG Wells was nearly simultaneous with some very serious scientific study of time as a physical dimension that eventually culminated in Einstein’s relativity. This, paradoxically, demolished the concept of simultaneity, that forms the basis of all mechanisms to measure time. All laws of physics, with the exception of the second law of thermodynamics, are indifferent to the direction of time and in principle, should allow people to move back and forth in time as they do as in left and right, or up and down. But of course, the ability to do so comes with the paradox of going back to the past and changing the course of history and hence it is ruled out, not by science but by logic. However Godel, the man who had upended the apple cart of mathematics with his Theorems of Incompleteness, has shown that such situations are not logically impossible and there could be physical worlds where there is no logical bar on time travel. This, along with the loss of simultaneity, leads to the concept of retrocausation where effect “precedes” cause and makes us wonder whether our language can support a discussion of such constructs.

Unfortunately, most of these deeper concepts are glossed over as the author regales us with descriptions of time travel that appear in various literary works. These I am sure are worth reading, not to understand the concept of time, but for the sheer pleasure of reading well written novels.


This review originally appeared in Swarajya,

February 18, 2017

Faster, Cheaper .. CRISPR

Let us imagine a quaint little colony where you and I live in peace but this peace is very often disturbed by outsiders who come in and dump garbage or play loud, ear-splitting music. What do we do? We collect photographs of the items that have created a nuisance in the past -- garbage bins, sound systems --  and store them in a photo album. We hire guards and give them  copies of photographs from this photo-album and ask them to check each and every visitor if he is carrying any of these unwanted items. When he finds a visitor who is carrying anything that matches with a picture in the album, he simply destroys the item or might even shoot the visitor. If a new culprit enters the premises, with a new disturbing item and is somehow overpowered by the community, then a picture of the new item is added to the album so that, in future, the guards can identify, destroy or deactivate it, if it is brought back again!
image from ScienceMag

This dramatised scenario is based on what happens when a streptococcus bacteria is attacked by a virus. Known hostile viruses, that are identified by a unique sequence of bases, or nucleotide molecules, in their DNA, are matched against a corresponding sequence in the bacteria’s DNA. Then a specific enzyme, the “gun”, is placed exactly on the matched part of the virus DNA and a “shot” is fired to break the virus DNA at the point of the match.  The virus repairs the DNA but the repair is never perfect. So the virus loses its original toxicity and the bacteria is saved from the viral infection. In the case of a new virus whose DNA signature is not available with the bacteria threatens the bacteria colony, some of the bacteria die, but those that survive keep a copy of the virus DNA signature in their own DNA, to be used against future attacks. To be technically correct, the virus DNA is not matched directly matched against its image stored in the bacterial DNA but against its complement, an RNA fragment, created from the DNA. This is like the neighbourhood guard being given a complementary negative of the image from the photo-album, not a photocopy.

This defence mechanism that bacteria have evolved over eons of evolution to defend themselves against hostile viruses is the backbone of a radical new technique called CRISPR/Cas9 that is sweeping swept through world of biotechnology and has revolutionised the way scientists modify the genes that define life and determine the characteristics of living organisms.

We have heard many horror stories about genetically modified (GMO) organisms but genes have been modified both naturally as well as artificially for many, many years. Random mutations or changes in genes happen naturally and they are propagated into progeny through the reproductive process while selective breeding of animals and plants are examples of artificial modification. But in all these cases, there is a lot of hit and trial involved and even when things work well, it needs multiple generations before the effect of the new genes become evident. Making similar changes directly, in a laboratory environment may be a little more easier, but not much. Current broad brush techniques are, slow, cumbersome, error prone and more often than not fail to achieve the desired goals. CRISPR/Cas9 promises to change this process so radically that it was widely tipped to win the Nobel Prize in 2016 but unfortunately it did not.

The genome, the sum total of all genetic material in any organism, is like a book  written with only 4 letters namely A, C, T, G. Actually each letter represents a base, an organic molecule. Specific sequences of letters form words. Some sequences of these words are irrelevant but other sequences of words form meaningful sentences that describe a specific recipe. In biological terms, a specific collection of meaningful sentences is called a gene and the recipe defines how the gene expresses or produces a specific protein. These protein molecules define how the living organism looks and behaves and their presence or absence can cause or prevent many diseases. The ultimate goal of genetic engineering is to alter these sequences of letters or bases in the genome of an organism so that beneficial outcomes, like disease resistance are enhanced and malicious outcomes like cancerous growth are inhibited.

But making these changes is not easy. The human genome can be viewed as a long chain of at least 3 billion letters -- spread over 23 chromosomes, or chapters, if we persist with the analogy of the genome being a recipe book. But only about 3 million of these are known to be a part of genes that play a definitive role, the rest are junk. To edit, or modify, an existing gene, any tool has to first locate its corresponding sequence of bases -- truly a needle in a haystack --  disrupt the sequence and then if possible replace it with another.

CRISPR -- Clustered Regularly Interspaced Short Palindromic Repeats -- are short identical sequences of bases that are located in the genome but are separated from each other by 32 unit sequences, called spacers, that are unique. These spacers, first located in bacteria that fight off invading viruses using the mechanism detected earlier, are images of various virus DNA that have attacked a bacteria in the past. Based on each such spacer DNA sequence, the cell itself, and now the scientist in a laboratory, can create an RNA fragment, called the gRNA, that will attach itself to a target DNA either in a virus, or in any other organism that the scientist wants to target -- precisely at the position where the sequence is identical to that in the original spacer. This is like walking down a street until you see a shop that is the shown in the photograph in your hand. Wherever, the gRNA stops, its sidekick, its  helper, a specific protein called Cas9 -- CRISPR associated protein 9 -- also stops, attaches itself, and like wire-cutter, makes a cut in the target DNA. This cut is repaired, but not perfectly, so the sequence of bases gets changed, the recipe becomes unreadable, the gene is disrupted and the corresponding protein cannot be produced. This two member team of gRNA molecule and Cas9 protein, that was earlier a defence mechanism for a naturally occurring bacteria is now a scientific tool that allows us to break a DNA at a very specific position in the chain.

Since the repair of the wire-cut DNA is imperfect, the gene is incapacitated or knocked-out. Very often this is desirable if the gene is responsible for some complicated disease. But what is even more interesting is if we can augment the two member gRNA, Cas9 team with a third member, an artificially prepared repair template that consists of a set of bases that we want to replace the original sequence with. Going back to our original analogy of the human guard, he is now given a bunch of flowers that he gives to the intruder after knocking out his garbage can and so instead of the stink of garbage, the intruder leaves our colony with the fragrance of roses.

CRISPR/Cas9 is a precision tool that can make small and precise changes in the DNA relatively easily. It is like using a thin brush to make changes in a precious painting instead of the earlier process of throwing a bucket of paint at it or using a high volume spray gun.

While CRISPR has been around since 1987, it was only in 2012 that Jennifer Doudna at the University of California, Berkeley and her collaborator Emmanuelle Charpentier  demonstrated the viability of using this two-molecule combination of gRNA and Cas9 protein to make precisions modifications on the genome. However in the same month, Feng Zhang of MIT’s Broad Institute filed for a patent for the same technology and the two teams have since been locked in an intellectual property battle of epic proportions. The commercial implications of this technology is immense. The race is now on to create specific gRNA molecules, that will locate and attach themselves to specific positions on the DNA of specific organisms, and the corresponding Cas proteins that will cut the DNA there. While human DNA is a very lucrative target as this may lead to cures of genetically transmitted diseases, even plant and other animal DNA is equally interesting as it would lead to disease resistant or high yield crops. All three principal actors in this drama have formed their own biotech firms to exploit the commercial benefits and Doudna and Zhang have already gone public. Finally, all three are widely tipped to win the Nobel Prize sooner or later for this remarkable technology.

Modifying the genetic code will lead to the creation of new, synthetic or hybrid organisms. This may or may not always be desirable but as we know,  there is no army that can stop an idea whose time has come and gene modification is one such unstoppable idea. Now we can do this faster, cheaper .. with CRISPR.

About This Blog

  © Blogger template 'External' by Ourblogtemplates.com 2008

Back to TOP