September 01, 2015

The Social Media route to Digital India

Can Digital India be launched on the back of a private social media network ?

E-Governance is a concept that has fascinated bureaucrats, academics and the IT industry in India for many years but the term is ill defined. At one end of the spectrum we have government departments that put up web portals with static, mostly obsolete, data while at the other end we have useful applications for, say, passports and income tax. This great disparity in sophistication and utility is because each such application is the result of an independent initiative and reflects the vision of the owner and the competence of the vendor who was awarded the tender on the L1 ( lowest cost) basis. In the corporate sector this is referred to as the “Thousand Island” scenario  -- with “islands” of automation separated by gaps of inconsistent data -- and the common solution that is offered is based on an “ERP” like SAP that can tie together all parts of the organisation with one coherent software. Unfortunately there is no such universal platform for government requirements but now with a new push for Digital India, we could perhaps envisage something similar -- based on the generic idea of social media networks like Facebook.

Why social media ? Because this is a platform that is very simple and convenient for people to use and relate to  -- no one has to be taught how to use Facebook or Linkedin! There is something intuitive about “friendships” and “group membership” that anyone can relate to and participate in through status updates, comments, “likes”. So if we create a private social media network, let us call it publiNc -- a Facebook look-alike, it should be fairly easy for government employees to join and use.

But use it for what ?

Since a large part of the government work consists of collection and dissemination of information, a social media platform, with its collection of blogs, forums, posts, comments, replies,  document attachment and messaging facilities, could be a simple starting point. Currently employees of the government operate on a diverse set of platforms but now in publiNc they will have integrated messaging, email, chat, and VoIP voice -- all available on desktops as well as smartphones. Individual departments (and sub-departmental groupings) can be reflected as publiNc communities, with their own “pages” or “groups” some of which could be closed and private while others could be open with some degree of moderation. Departmental administrators who currently arrange for tea, tables, cubicles, cars etc., can trained to be “group admins” responsible for moderation. By facilitating all communication within the government, publiNc will be the first step towards Digital India.

But there is more to government than sharing information. Information has to be recorded, processed and displayed on demand -- through database driven transactional applications. On social media we see this in the form of thousands of games or apps -- independently developed software like Candy Crush, FarmVille etc.  -- that sit on top of and provide diversity to the social media experience. So would it be on publiNc. Individual departments, or “communities”, can have their own software developed as an “app” and users who need this functionality can add it to their profiles. publiNc apps built for one department and found to be useful can also be used by similar departments operating in other parts of the country. Since these “apps” are generally hosted on independent hardware, issues related to security and privacy can also be addressed.

Another key requirement in government is approvals, along with audit trails. This is a straightforward workflow application where an electronic document moves through a series of predefined named individuals who must take some action, that signifies approval, before it moves to the next person. In social media terms, this means that after a user has taken some action with a publiNc app, the app must automatically post an update on the wall of the next person along with a link that will lead him back to the same app. This is no different from tagging friends on a post or Facebook games that automatically invite friends to come and play the same game. This is a great irritation on Facebook but is perfectly suited for our publiNc workflow since it sets an automatic reminder for the next person to take action.

Finally, privacy settings for individuals, communities and apps will ensure that only the right people will know who is doing, or not doing, what and for how long.

The next step would be to allow private citizens to open accounts in publiNc so that they too can communicate with government employees. But before being activated, all such accounts should be verified against some government document like PAN card, passport or Voter ID card and a KYC-compliant mobile number to establish the authenticity of the user. With such an account citizens should be able to communicate with individual government employees or with the “pages” created by departments for such purposes. Questions posted on such pages can be answered by designated employee and, depending on privacy settings, can be viewed by many. Similarly, citizens can be allowed to install and use certain apps that allows one to send applications, pay fees and taxes or do anything else that would otherwise need a visit to a government office and wait in a queue in front of counter. Responses from the government department can be posted on searchable forum posts or sent to private inboxes.

So now we have a complete digital ecosystem that could link all parts of the government with each other and, potentially, with each and every citizen of this country. Normally, such a gigantic system would take years to build and need thousands of computers to run on but the beauty of social media is that it is intensely scalable. When Zuckerberg created Facebook in 2004 it was a tiny system meant for a few students of Harvard University. So can it be with publiNc. We can begin with a small system with a few apps that caters to the needs of a couple of departments at the Centre and a few of the more dynamic States. Then just like Facebook, publiNc should attract people, both government employees and citizens and add more and more e-governance apps until it can become a behemoth that Facebook is today -- spanning the whole world and having more than 1 billion users -- the same as India’s population today.

The technology for building a social media network like publiNc is very much available and that too much of it is in the public domain. Issues related to scalability, stability and security are all known and easily addressable within reasonable cost and time. While 80% of India’s population may not have access to the internet, the other 200+ million who have access constitute a large enough pool worth catering to and with the proliferation of smartphones the number of publiNc users will grow by leaps and bounds. So there is no doubt that the system can be built very easily and can be readily used both by government employees and by citizens

But the real challenge lies elsewhere. Will our government employees want to use such a system ?

Can our politicians and bureaucrats tolerate the intense transparency that a system like publiNc will bring to the country? While privacy settings may keep many of the actions and decisions out of the public view, all actions taken -- or not taken -- will be automatically noted, recorded, stored and may be retrieved by duly authorised agencies. This can a serious blow to the discretionary powers enjoyed, and abused, by government employees. Hence the biggest challenge to publiNc would come from the entrenched vested interests who would be reluctant to allow the system to work. There is a clear and present danger of publiNc being sabotaged from within.

But this is not a problem restricted to publiNc alone.It is something that can plague any e-Governance system that is created to support the vision of Digital India. So there can be no technology based approach that will facilitate or expedite the adoption of a e-Governance applications -- we need a “social” approach or one driven by peer-pressure, or the need to conform to the environment.

This is where social media based software is so different from the typical IT systems that we use in banks, shops, government agencies and other commercial environments. No one needs to force anyone to join and use networks like Facebook or Linkedin. People not only find these systems useful and easy to use but what is most important is that without it, they feel that they are being left behind by their peers. Hence the urgent need to join and catch up. So if a few pioneering departments can adopt this social media approach, most of its employees would willy-nilly join in and then other departments can join in one after the other. Adoption can be expedited however, with a few useful and universal apps like leave application, provident fund or new pension scheme updates and so on.

Finally, one can ask if such a social media based approach to e-Governance has been tried in any other country. Possibly not, but for a change, let Digital India show the world a new way of doing things, instead of forever copying what is happening elsewhere ?
This article originally appeared in the August Issue of Swarajya, the magazine that reads India right!

August 13, 2015

Wrahool arrives in New York

July 17, 2015

Distance Learning - Reloaded

Distance learning using the internet -- that allows students anywhere in the country to learn from the best teachers --  is old news! We have heard of Massively Open Online Courseware (MOOCs) popularised by Coursera and Udemy, applauded the good work done by the Khan Academy and proudly talk about the gigabytes of, rather boring, videos loaded into Youtube by IIT professors under the NPTEL program. But none of all this has had any impact on the critical skill gap that separates students who pass out of India’s colleges from the jobs that await them in a booming economy.

Institutions like IITs, IIMs,  NITs, Presidency and others, may be doing well, but there are another 600 degree granting institutions with more than 35,000 affiliated colleges that have lost the plot completely. Most of these have inadequate infrastructure, teachers who are barely competent or rarely in the classroom, outdated syllabi and an academic atmosphere vitiated by student politics and yet it is to colleges like these that nearly 2 crore students -- potential contributors to India’s demographic dividend -- have to turn to, to realise their dreams of getting a bachelors degree.

Reforming college education in India is beyond the ability of mere mortals. A complex hubris of policy, bureaucracy, corruption, hypocrisy, xenophobia and old fashioned stupidity ensures that modern management techniques cannot be used. Private investors are not allowed to make profits from education but are harassed for bribes by venal regulatory bodies and the government has neither the means nor the ability to deliver. So instead of platitudes about “revamping the system” let us explore an alternate architecture that will deliver education and help students find jobs.

The only way to bypass the local college is to resort, once again, to web based distance learning mechanisms where the technology necessary for creating and distributing online content is available either free or for a nominal cost. The primary content would be videos recorded with a webcam and uploaded into Youtube. The free Google Hangout-on-Air feature allows not only videos recorded on webcams  but also screen capture from Powerpoint style slide decks or any other program. At the student end, all that is needed is a computer and broadband access, and given the amount of money spent on private tuitions, this is an investment that most students and their parents would be happy to make if they see value.

But to show value we need to cross two big hurdles, namely an economic model for content creation and a stamp of accreditation on the education delivered.

Creating online content, of the quality that we see on Coursera or Udemy, is a very labour intensive process and no teacher would ever put in this effort on a sustained basis unless there is adequate compensation. But on the internet, everyone expects everything to be free! Students will happily spend money for movies at a cineplex but not to access online courses. When the courseware is free, everyone will use it but the moment there is a charge, most students will back off -- even though they may be paying the same for tuition! So this is the first challenge -- why would anyone create online content when there is no guarantee of any financial return. Volunteers can never be the basis of any sustainable model.

The second challenge is trust. Corporates flock to IITs, IIMs  not because of what they teach but because students have been filtered through JEE or CAT. Similarly when corporates ask for graduates, a BA, BSc, or BCom, it is not so that they are specifically looking for a detailed knowledge of History, Chemistry or Accounting but only an assurance that the candidate has the discipline to study for and pass examinations in at least thirty subjects and in the process has learnt something useful. Since the stamp of a degree granting institution is a necessary condition for hiring a person, such a stamp is essential for the credibility of any alternate model.

How should we address these two critical issues?

Content remains free for students but government will pay teachers to create it. For each course module or video created and uploaded, the government can pay a nominal amount of money, say Rs 1000, to the teacher who creates content. However the real payoff for the teacher will come if the video is found to be useful by students. Youtube tracks views as well as an approximate duration of the view. Teachers may be paid, say Rs 20 each time more than 70% of a video is viewed on the web. An IIT faculty, for comparison, is paid around Rs 150 per student-lecture for delivering 100 lectures across 4 subjects to classes of 50 students each. So instead of a flat payment to all content creators, we now pay more to teachers who create better content based on an objective metric of actual viewership -- the invisible hand of the market at work!

The next critical issue is certification. Online examinations are possible but given the mutual mistrust that is endemic to Indian society, the credibility of examinations taken by students from home is zero. What we need instead are distributed examination centres where students can take examinations under invigilation. Given the long history of JEE, CAT, AIEEE, Bank PO, UPSC and similar examinations, this process is well understood. In fact there are a number of private companies that have earned the trust of corporates in conducting examinations and can do so at a lower cost and with higher credibility.

Let us now connect the dots.

We begin with one or more UGC approved degree granting institutions, that announce a set of courses for which it will conduct examinations. These would  be standard 96 credit bachelors level courses leading to a BSc, BA or BCom degree -- in areas like computer science, economics, commerce, political science, mathematics or anything that does not require physical laboratory facilities. The institution will specify the syllabus for the 32, 3-credit subjects that the students will be examined on and appoint one or more agencies that have the physical facilities to conduct online examinations across the country. Student will pay and register for a series of examinations, subject to meeting prerequisites, and on successful completion accumulate credits towards the final degree.

In parallel, the UGC will invite teachers across the country to create online Youtube content for the subjects necessary to pass the examinations. Any teacher in any college anywhere in the country can start creating content. If the economic incentive -- in terms of the upfront as well as the monthly payouts -- is adequate, a large volume of high quality content will get generated over a period of time and the best part of the deal is that a significant part of the government expenditure will be directed towards content that students find useful and actually view or use.

For the 32 subjects, leading to a 96 credit bachelors degree, 640 modules have to be created. At Rs 1000 per module, the upfront cost to the government is a paltry Rs 6.4 lakhs if we assume only one set of videos are produced, though in reality multiple teachers may prepare the same videos and compete for popularity. The teacher on the other hand gets Rs 20,000 for creating 20 lecture-modules and then at Rs 20 per view, can easily make another Rs 20,000 per month if his 20 modules are viewed even 1000 times. In fact a fraction of the money received through the 3G/4G auctions would be more than adequate to pay teachers for creating content that will be accessed through these networks

Since students have the flexibility to view lectures from any of the ‘competing’ teachers who are teaching the same subject  enterprising teachers can create social media groups in Google+ or Linkedin and add value to their lectures by creating discussion threads where students can ask questions and clear doubts. This will enhance the popularity of the lectures, increase viewership and lead to higher payouts. In fact, a dedicated social media network, like the Kollaborative Klassroom at Praxis Business School, should be created as well for this purpose.

But the key to the success of any educational initiative in India is job placements. IITs, IIMs command respect, not for what they teach but for the jobs that their students get. So the capstone or pinnacle of this new educational architecture must be a forward integration with a job portal like Naukri or Monster that has a pro-active, professional placement mechanism built around a business model that earns money from successful placements. Those who pass examinations and qualify for the degree should be automatically registered on the portal and actively marketed to corporates. Since the portal makes money from placements, the feedback that it collects from recruiters must be used by the degree granting institution to define curricula and refine syllabi so that the skill gap can be narrowed significantly.

The problem with MOOCs is that without a strong motivation to complete, most students drop out, which is why we need the strong attraction of a degree and job placement at the end. Most of the components of this new ecosystem are already available. It only needs a little money from the government and an enterprising administrative mechanism to connect all the dots and assemble a bold new way of delivering quality education to students and money to good teachers.
this article originally appeared in Swarajya - the magazine that reads india right

June 29, 2015

From Hadoop Streaming to RHadoop

The challenge of combining the statistical power of R and the "Big Data" capabilities of Hadoop is something that has always fascinated me. Over a year ago, I had finally broken free from from the stupidity of the WordCount ( and various other counting ) programs and tried to solve a real like retail problem with linear regression using R and Hadoop. This is documented in my blog post Forecasting Retail Sales -- Linear Regression with R and Hadoop. In this case however I had used the Hadoop streaming API to call to separate R programs.

Subsequently I had come across the Hortonworks HDP platform that dramatically simplified the process of installing and running Hadoop. This is explained in my blog post Big Data for the Non Geek, where in addition to installing Hadoop, I have also explained how to overcome the challenges of installing the RHadoop packages on top Hadoop on the Hortonworks platform.

Hortonworks has a nice example of how to run an rHadoop program on the HDP platform but I was very keen to see how to make my port my Retail Sales program from the traditional streaming mode to the rHadoop mode. This means replacing the two R program LinReg-map.R and LinReg-red.R into one R program LinReg-MR.R and run this, not from the linux command prompt but from R Studio itself.

This process is explained in this post.

First I had to check whether my original LinReg-map.R and LinReg-red.R would work on the Hortonworks HDP platform. Fortunately, they did but a small change was required in the command line -- not the two -file properties attached right at the very end

# -- used in the Hortonworks HDP .. two -file commands required
hdfs dfs -rm -r /user/ru1/retail/out900
# ---
hadoop jar /usr/hdp/ -D'RetailR' -mapper /home/ru1/retail/LinReg-map.R -reducer /home/ru1/retail/LinReg-red.R -input /user/ru1/retail/in-txt -output /user/ru1/retail/out900 -file LinReg-map.R -file LinReg-red.R 

Next thing that we had to do was to convert the five tab separated TXT files used as input into five corresponding comma separated CSV files. I am sure tab separated TXT files can also be used but for the time being it was easier to convert the data files to CSV than to explore the TXT option.

Finally, the two R program were merged into one R program and this is available in the rHadoop directory of the original Retail github repository.

Significant changes are as follows --
In the MAP part

-- the original Map program

DailySales <- read.table("stdin",col.names=c("date","sku","sale"))
for(i in 1:nrow(DailySales)){
  Key <- as.character(DailySales[i,]$sku)
  Val <- paste(as.character(DailySales[i,]$date),"$",as.character(DailySales[i,]$sale))
  cat(Key,gsub(" ","",Val),"\n")

-- the modified Map subroutine

mapper1 = function(null,line) {
    ckey = line[[2]]
    cval = paste(line[[1]],line[[3]],sep = "$")

The Reduce part is greatly simplified

-- the original Reduce program

mapOut <- read.table("stdin",col.names=c("mapkey","mapval"))
CurrSKU <- as.character(mapOut[1,]$mapkey)
CurrVal <- ""
days <- ""
sale <- ""
for(i in 1:nrow(mapOut)){
  SKU <- as.character(mapOut[i,]$mapkey)
  Val <- as.character(mapOut[i,]$mapval)
  DataVal <- unlist(strsplit(Val,"\\$"))
  if (identical(SKU,CurrSKU)){
    CurrVal = paste(CurrVal, Val)
    if (FIRSTROW)  {
      days <- DataVal[1]
      sale <- DataVal[2]
    } else {
    days = paste(days,DataVal[1])
    sale = paste(sale,DataVal[2])
  else {
    CurrSKU <- SKU
    CurrVal <- Val
    days <- DataVal[1]
    sale <- DataVal[2]

-- the modified Reduce subroutine

reducer1 = function(key,val.list) {
  days <- ""
  sale <- ""
  for(line in val.list) {
    DataVal <- unlist(strsplit(line, split="\\$"))
    days <- paste(days,DataVal[[1]])
    sale <- paste(sale,DataVal[[2]])

  retVal <- EstValue(key,days,sale,9)

the "key" difference is that instead of using the cat command to emit the key-value pair,  we use the keyval function of the rmr2 package to move data from the mapper to the reducer. Also the reducer gets all the values for one key and so no sequential processing required to isolate the values associated with one key.

The actual Linear Regression is done by the EstValue function and ideally it should have had no changes at all. However there was ONE change that was required and that is shown here.

#PastSale = Reduce("+",sale)
PastSale = 0
for (j in 2:length(sale))PastSale = PastSale + sale[j]
#PastSale= sale[2]

The total of Past Sales, though not required for the Regression was being calculated by the Reduce function but somehow this would never work in rHadoop. Had to be replaced with a manual loop and that too starting from 2! However the answer in both cases -- streaming and rHadoop -- is the same.

To be honest, rHadoop is actually no different from the Hadoop streaming process. In fact it is one and the same but there are some small changes that one needs to make and the benefit is that one can work from the familiar confines of the RStudio as you can see from this screenshot.

Update :  A very comprehensive tutorial on RHadoop is available here. Here is a guide on how to run RHadoop on Amazon AWS EMR

May 19, 2015

Technology, Management & Systems : The Holy Trinity ?

India has a strange fascination for engineers and MBAs. Everybody wants to become one, or preferably both. So is the case with systems, or as they say in India, the IT sector. But this fascination is not because of any natural aptitude for these disciplines but simply because they help one to get a job in an otherwise dismal economy. This is unfortunate, because if we step back for a moment and think through  issues that haunt this country, it would seem that our salvation lies in leveraging this holy trinity to dig us out of the hole that we find ourselves in!

Let us consider a few exemplary scenarios.

Till the 1990s, telephones in India were a disgrace. While landline technology was readily available and widely used all across the developed world, we were still at the mercy of the corrupt and inefficient P&T department that ensured that very few of us had access to one. This changed dramatically with the arrival of cell-phones that  bypassed the constraints of the local loop and managed to put a phone in every hand. But just as we were about to take off, the evil empire struck back with the 2G scam that again put us back by several decades in 3G and 4G services.
If we go back further, to the 1960s, we would see that India was facing a massive shortage of food and waiting for PL480 handouts from the United States to fight famines. Then again it was a burst of technology -- food technology, in the form of better fertilizers and high yielding crops -- that saved the day. We managed to stave off starvation but once again the theft in the public distribution system and the mismanagement of the supply chain -- as visible in images of grains rotting in godowns --   brought back the spectre of Kalahandi and haunts us even today.

In fact, India lost the plot much earlier, in the middle ages when we failed to board the Renaissance bus. While the Mughals were celebrating the high noon of their culture with the Taj Mahal and Urdu shayaries, Europe was passing us by with the structured and rational thinking of Galileo, Rousseau and Newton. Administrative systems like the modern nation state, with its civil and military services, economic systems like joint stock companies, banks, insurance agencies and educational systems built around universities that granted formal degrees never really got off the ground in India in the same way that they did in Europe. Not until the British brought them here.

Systems, in this context, are a metaphor for the rational approach to address social and commercial problems -- free of divine directives or bureaucratic whimsy. Such systems result in the development of superior technology and in the efficient usage of natural and social resources. This in turn reduces the  degree of social conflict, increases the physical living standards and empower societies with the luxury of confidence. Such systematic societies can then confront, conquer and convert societies that are irrational, unstructured or unsystematic -- as we have seen in the case of Europeans conquering large parts of America, Asia and all of Australia.

Computer “systems”, that appeared much later, draw their inspiration, and of course the moniker systems, from the same structured, rational,  or systematic approach that they bring to bear on solving problems and are a specific example of a more general approach. Can such systems free the future of India from the confines of its past and its present?

A little introspection will show us that we are an inherently irrational in our thoughts and chaotic, if not anarchic, in our deeds. In our irrationality, we could have still lived with our little obsessions with gods, “godmen” and superstitions but the real problem is when the irrationality seeps into our secular and political structures. We scream and protest against corruption but are the first when it comes to boasting about how we could bribe our way through the system and -- this is far worse -- a majority of us would not hesitate to seek bribes and favours whenever we are in a position to do so. This is particularly true for anyone with any kind of discretionary power within the government -- whether it is a peon, who will not let you meet the officer, or the officer himself who can sign a paper and make your life a little easier.

Corruption is a characteristic that is woven into the warp and woof of the Indian administration and yet we have the unedifying spectacle hundreds and thousands of irrational souls sitting on dharna at Jantar Mantar and asking for a Lokpal -- as if one man can do what a whole zoo full of institutions like the CAG, CVC, CBI could not achieve! We refuse to believe, to quote a Bengali maxim, that the evil spirit is in the very mustard that is being used for the exorcism!

Our irrationality extends to our naive belief in the democratic process where we cannot see through the chicanery of the promises made by the candidates. We believe that it is correct to vote for the person who promises illegal and untenable benefits for me and my caste. This natural irrationality is extended and reinforced by our love for anarchy. We believe that burning buses and other public property, calling bandhs and blockading roads is a natural and justified way in which our “leaders” or elected representatives can help us towards a better life. This irrationality is not the exclusive preserve of the uneducated and we see even people with college education passionately espousing economic theories that have been discarded in the dustbins of history. Robust, rational pragmatism is hard to come by in this country.

We could go on with similar anecdotes but the point is that as a nation we are incapable of governing ourselves. In state after state and in every Lok Sabha election we have voted for, elected and given to ourselves the terrible governments that we, as an irrational and anarchic people, justly deserve. But does this mean that it is time for the British Parliament to repeal the Indian Independence Act of 1947? Fortunately, there could be a better solution  based on the trinity of technology, management and systems.

Expecting government officials to shun corruption or the electorate to vote rationally is like expecting Kolkata to have a weather like Darjeeling. There may be a few exceptions here and there but by and large, very, very unlikely. We need to work with these two chips on our shoulders. Bending, breaking and abusing the system is a leitmotif of India. Individual Indians may be sane and rational but in a mass and as a collective, they will never be. This the foundational principle on which the governance of India needs to be based.

Since people are at the heart of the problem we need to minimise their role in discretionary decisions and, to the extent possible, from the delivery process. Cell phones succeeded where land line phones failed because they did not need an army of corrupt, anarchic people to maintain the thousands of lines running across the country. The towers are unmanned and the central switches need a few competent people. This is a perfect example of a technology trumping the accumulated hubris of centuries and is the model that we must try to emulate in other areas as well.

This though, is easier said than done!

The technology is never the issue but to implement it against the wishes of people, who see this as an infringement of their fundamental right to be corrupt or anarchic, is the real challenge. This is where smart management techniques come in very handy. The key is to use the carrot and the stick to cajole, convince, convert, confuse or coerce everyone so that they have no option but to be yoked to structured, technology-enabled systems. Individual brilliance and creativity is great and diversity is something wonderful to celebrate but, if cars do not stop at traffic lights but only when people people block the road, then society collapses into the kind of anarchy that India is familiar with. Net-net we need to design systems that will bring technology and management techniques into the governance process in a manner that minimizes the need for people in the governance process.

Is this possible at all ? To a large extent, yes.

Since data should be the basis of any rational decision, our systems must forcibly collect data and place it in the public domain. Next a clear set of algorithms, or rules, must be put in place so that the data itself drives decisions -- say, for example, approvals for or limits on expenditure, the quantum of taxes due -- in a way where humans have only a supervisory role. Finally the data, the process of arriving at a decision and the decision itself must be automatically visible to the public. This is a generic template for transparent governance. A simple example would be a Wikimapia style map showing physical locations of NREGA projects along with time-stamped, GPS encoded pictures shot before and after the project is executed -- without which no further funds will be released to the panchayat in question. Three previous articles in these columns have shown how similar systems can indeed be designed to help expedite justice in courts, facilitate elections and track corruption at the operational level.

The design and implementation of such systems would of course eliminate a lot of redundant but hugely lucrative positions in the administration and so would be stoutly resisted by an army of the most corrupt. This can be overcome only if the elected leadership has the political will and the administrative wherewithal to place a few honest and technophile administrators at key decision making posts in government. This is the only, and minimal, ask if we want to see technology enabled rationality in the governance of this country.

The tyranny of a Singapore style benevolent dictatorship may pose too big a risk for a big, multicultural country like India but the tyranny of systems developed and deployed by a few smart and well meaning people employed by an elected government is the answer to India’s perennial problems.

This article originally appeared in Swarajyamag -- the Magazine that reads India right

May 03, 2015

Maps of India : DIY with R and GADM data

Displaying spatial data on maps is always interesting but most Visualisation tools do not offer facilities to create maps of India, especially at the state and lower levels. In this post, we will show how such maps can be made.

The base data for such maps, the "polygons" that define the country, the states, the districts and even the talukas ( or sub-divisions) is available from an organisation called Global Administrative Areas or Country level files for almost all countries are available in a variety of formats including R and these are at three different levels. For India, these files can be downloaded as IND_admN.RData where R = 1,2,3. These will form the raw data from which we will create our maps.

Unfortunately, the GADM files represent a truncated Kashmir. How I wish that the Government of India and the National Atlas and Thematic Mapping Organisation would publish similar files for us. Anyway, we work with what we readily have ...

Working with R, we will need two R packages :

# Load required libraries

Assuming that the downloaded RData file is located in the R working directory, the following code will generate a basic India showing the states

# simple map of India with states drawn
# unfortunately, Kashmir will get truncated
spplot(ind1, "NAME_1", scales=list(draw=T), colorkey=F, main="India")
Now suppose there is some data ( economic, demographic or whatever ...) and we wish to colour each state with a colour that represents this data. We simulate this scenario by assigning a random number ( between 0 and 1) to each state and then defining the RGB colour of this region with a very simple function that converts the data into a colour value. [ This idea borrowed from gis.stackexchange ]

# map of India with states coloured with an arbitrary fake data
ind1$NAME_1 = as.factor(ind1$NAME_1)
ind1$ = runif(length(ind1$NAME_1))
spplot(ind1,"NAME_1",  col.regions=rgb(0,ind1$,0), colorkey=T, main="Indian States")

Now let us draw the map of any one state. First check the spelling of each state by listing the states:

and then executing these commands :
# map of West Bengal ( or any other state )
wb1 = (ind1[ind1$NAME_1=="West Bengal",])
spplot(wb1,"NAME_1", col.regions=rgb(0,0,1), main = "West Bengal, India",scales=list(draw=T), colorkey =F)

# map of Karnataka ( or any other state )
kt1 = (ind1[ind1$NAME_1=="Karnataka",])
spplot(kt1,"NAME_1", col.regions=rgb(0,1,0), main = "Karnataka, India",scales=list(draw=T), colorkey =F)

If we want to get and map district level data then we need to use the level 2 data as follows :

# load level 2 india data downloaded from
ind2 = gadm

and then plot the various districts as

# plotting districts of a State, in this case West Bengal
wb2 = (ind2[ind2$NAME_1=="West Bengal",])
spplot(wb2,"NAME_1", main = "West Bengal Districts", colorkey =F)

To identify each district with a beautiful colour we can use the following commands :
# colouring the districts with rainbow of colours
wb2$NAME_2 = as.factor(wb2$NAME_2)
col = rainbow(length(levels(wb2$NAME_2)))
spplot(wb2,"NAME_2",  col.regions=col, colorkey=T)

As in the case of the states, we can assume that each district has some (economic or demographic) data and we wish to colour the districts according to the intensity of this data, then we can use the following code :

# colouring the districts with some simulated, fake data
wb2$NAME_2 = as.factor(wb2$NAME_2)
wb2$ = runif(length(wb2$NAME_1)) 
spplot(wb2,"NAME_2",  col.regions=rgb(0,wb2$, 0), colorkey=T)

But we can be even more clever by allocating certain shades of colour to certain ranges of data as with this code, adapted from this website

# colouring the districts with range of colours
col_no = as.factor(as.numeric(cut(wb2$, c(0,0.2,0.4,0.6,0.8,1))))
levels(col_no) = c("<20%", "20-40%", "40-60%","60-80%", ">80%")
wb2$col_no = col_no
myPalette = brewer.pal(5,"Greens")
spplot(wb2, "col_no", col=grey(.9), col.regions=myPalette, main="District Wise Data")

To move to the district, sub-division ( or taluk) level we need to use the level three data file :

# load level 3 india data downloaded from
ind3 = gadm

# extracting data for West Bengal
wb3 = (ind3[ind3$NAME_1=="West Bengal",])

and then plot the subdivision or taluk level map as follows :

#plotting districts and sub-divisions / taluk
wb3$NAME_3 = as.factor(wb3$NAME_3)
col = rainbow(length(levels(wb3$NAME_3)))
spplot(wb3,"NAME_3", main = "Taluk, District - West Bengal", colorkey=T,col.regions=col,scales=list(draw=T))

Now let us get a map of the district - North 24 Parganas. Make sure that the name is spelt correctly.

# get map for "North 24 Parganas District"
wb3 = (ind3[ind3$NAME_1=="West Bengal",])
n24pgns3 = (wb3[wb3$NAME_2=="North 24 Parganas",])
spplot(n24pgns3,"NAME_3", colorkey =F, scales=list(draw=T), main = "24 Pgns (N) West Bengal")

and within North 24 Parganas district, we can go down to the Basirhat Subdivision ( Taluk) and draw the map as follows: 

# now draw the map of Basirhat subdivision
# recreate North 24 Parganas data
n24pgns3 = (wb3[wb3$NAME_2=="North 24 Parganas",])
basirhat3 = (n24pgns3[n24pgns3$NAME_3=="Basirhat",])
spplot(basirhat3,"NAME_3", colorkey =F, scales=list(draw=T), main = "Basirhat,24 Pgns (N) West Bengal")

This is the highest resolution ( or lowest administrative division ) that we can go with data from gadm. However even within a map,  one "zoom" into and enlarge an area by specifying the latitude and longitudes of a zoom box as shown here.

# zoomed in data
wb2 = (ind2[ind2$NAME_1=="West Bengal",])
wb2$NAME_2 = as.factor(wb2$NAME_2)
col = rainbow(length(levels(wb2$NAME_2)))
spplot(wb2,"NAME_2",  col.regions=col,scales=list(draw=T),ylim=c(23.5,25),xlim=c(87,89), colorkey=T)

With this it should be possible to draw any map of India. For more comprehensive examples of such maps, please see this page.

new PostScript : The full code for creating this maps as well as additional information on how to place text and markers on these maps is available on my specialist visualisation blog.

April 22, 2015

Mapping Money Movements to Trap Corruption

In the chequered history of parliamentary legislation in India, the RTI Act stands out as a significant milestone that puts activities of the government under public scrutiny. But even though the Act gives a legitimate platform for citizens to ask questions on, the process is cumbersome and answers are often given in a manner that is not easy to understand or make use of.

But why must a citizen have to ask for something that is his by birthright? Why can the information not be released automatically? But then who decides what information is to be released? At what level of detail? At what frequency?

The biggest challenge facing India is corruption. It is the mother of all problems because it leads to and exacerbates all other problems. If controlled, the money saved can be used to address most deficiencies in health, education and other social sectors.

Misguided people wrongly believe that having a strong Lokpal will solve the problem but when bodies as powerful as the CBI, the CVC have been subverted and compromised by political interests, it is foolish to expect one more publicly funded body, like the Lokpal to be any better. Instead, let us explore how a crowd sourced and data driven approach can help both track and crack the problem.

The central and state governments in India, between them, spend about Rs 25 lakh crores every year. Even if, very optimistically, only 10% of this is lost in corruption, then the presumptive loss to the public exchequer is Rs 2.5 lakh crores every year, compared to the one time loss of Rs 1.8 lakh crores in the  CoalGate scam.

Can we follow the clear stream of public money as it slowly gets lost in the dreary desert sands of government corruption? Most public money starts flowing from the commanding heights of the central government and passes through a complex hierarchy of state and central government departments, municipalities, zilla parishads, panchayats until it reaches the intended beneficiary, who could be a citizen, an employee or a contractor. Flows also begin from money collected as local taxes and routed through similar channels. The accompanying diagram gives a rough idea of this process.

In the language of mathematics, this is a graph consisting of a collection of nodes connected to each other through directed edges. Each node represents a government agency or public body and each directed edge represents a flow of money. Inward pointing edges means that the node or agency receive money while outward pointing edges represent  payments made. Green nodes are sources where money enters -- this could be tax, government borrowings or even the RBI printing notes, while yellow nodes are end-use destination of public money --  salaries, contractor payments, interest payments and direct benefits transferred to citizens.

Ideally, the flow of money through this network should be such that over a period of time the total amount that flows into the network at all green nodes should equal the total that flows out into all yellow nodes. In reality the sum will never add up because there is significant leakage, or theft, in the network. The money lost in transit within the graph between green and yellow nodes is one quantifiable measure of corruption. Another, less obvious case is the inexplicable or unusually high flows of money  -- as in the case of the Chaibasa Treasury at the height of the Fodder Scam or for a sudden spurt of expenditure in widening all roads inside a particular IIT Campus. Any deviation from norms, either historical or from similar expenditure elsewhere, needs an investigation and explanation.

Can such anomalies and deviations be detected? and how does the RTI Act fit into this picture?

This graph is large, the number of nodes and edges is very high and the problem seems insurmountable.  But we can  divide and conquer the problem because many tasks can be done in parallel by independent groups. Instead of focussing on the entire graph at one go, we can zoom-in on certain segments of the graph and examine specific nodes at higher magnification or smaller granularity. In principle, if the flow through every node is accounted for, then the flow through whole graph gets accounted for automatically.

How do we conquer each node?

Since each node is a government organisation, it falls under the RTI Act and we can demand details of all its cash flows. Once this is in the public domain it can be examined by private volunteer investigators either manually or by automated software specifically designed for forensic audit. In fact if Google search bots, or software robots, can crawl through the entire web to track down, rate and index trillions of unstructured web pages, it would not be difficult to build software that can track down and reconcile each and every cash flow transaction in India provided the data is publicly available in a digital format.

The results of such investigations and the unbalanced flows that are revealed should also be in the public domain and would be the starting point of either a formal CAG directed audit or citizen activism directed at the agency concerned. If the cash flows of a particular panchayat or a government agency do not add up or seem suspicious, affected parties should take this up locally either through their elected representatives or through more specific and focussed RTI requests.

This may seem complicated but is not really so. All that we are demanding is that bodies that deal with public money should publish their financial accounts into the public domain in a standardised format. Obviously the accounting format specified for listed companies may not be appropriate since assets and liabilities are accounted for differently and there is no question of profit or loss for public bodies. Instead, the focus is on the cash flow statement. Specifically, how much money is coming in? and where is it going to?

How will this actually work in practice?

First, the CAG and the Institute of Chartered Accountants of India will create a format to report all cash flows in public bodies in terms of a nationally consistent set of cost codes and charge accounts. Next the CIC will mandate that all public bodies must upload this information every quarter into a public website maintained by the CIC. As this information accumulates online, volunteer auditors, public activists, anti-corruption campaigners and even the CAG itself, if it wants to, can collaboratively build a website, like wikipedia or wikimapia, that pulls data from the underlying CIC website and displays the cash flow graph. Even as the graph gets built, people can start looking for missing or unusual flows.

So there is no persistent workload on either the CAG or the CIC. However each public body must prepare its cash flow statement and put it on the CIC website. In any case, a public body is expected to keep a  record of  all cash flows --  through cash, cheque or EFT. This needs to be put into CAG designated format and uploaded periodically to the CIC website.  

In fact the CIC website should already be ready because in October 2014,  the DoPT  announced that henceforth all replies under the RTI Act will be uploaded to the web in any case!

In the initial stages, the graph will have major discrepancies in cash flow and this could be because all agencies, or nodes, have not been identified or are not reporting information. At this point, activists can put in specific RTI requests to the concerned agencies to report their information in the CAG format and the CIC must ensure immediate compliance with the same.  Over a period of time and with a number of iterations the cash flows through all parts of the network should first become visible and then balance out. Compliance will be achieved by continuous RTI pressure at the grassroots, applied in a parallel and distributed manner, on defaulting government agencies.

When known outflows do not match inflows or if there are deviations from expected norms then activists can draw the attention of the media and opposition politicians who can raise a hue and cry. This should lead to the usual process of investigation where the CVC, the CBI or even the Supreme Court can get involved. After all,  Al Capone, the notorious US gangster was finally convicted on the basis of a forensic audit, not a shoot-out police operation!

Unlike the top-down Lokpal driven approach, this bottom-up strategy calls for no new law, no new agency, no new technology, no new infrastructure. All it needs is a format to be defined by the CAG and for the CIC to ensure that any RTI demand in this format be addressed immediately. With this, the flow of public money will become visible and if this transparency leads to a reduction of only 10% of the presumptive loss of Rs 2.5 lakh crores, it would still mean an additional Rs 25,000 crore for the Indian public every year.

19th century British India had the vision, the audacity and the tenacity to carry out the Great Trigonometrical Survey that created the first comprehensive map ( see diagram)  necessary to govern this vast country. Armed with the digital technology of the 21st century, a similar mapping of all public cash flows will lead to greater transparency in the governance of modern India.

a high resolution version of this image is available from wikimedia.

This article first appeared in Swarajya -- the magazine that helps India think Right

About This Blog

  © Blogger template 'External' by 2008

Back to TOP