September 26, 2015

Magic of SQL in scoring Data Mining Models

As a former DBA I love the fact that SQL has a life of its own and the way it is still used after it was first conceived of in the 1980s as the natural way to query Relational Database Management Systems. A vast amount of data in the world is still available in SQL compliant RDBMS tables and today when business analytics and data science seems to be overshadowing SQL, I was delighted to find that SQL can still play a very important role in the implementation of complex data mining applications. This post explains how this evergreen tool is still very, very relevant in data mining.

Many data mining tools like Rattle, RapidMiner are used to create "models" for Classification / Decision Trees and Heirarchical Clustering but then the models have to put into production by using them to score large datasets. This is where SQL can play a very powerful role. The models created in the data mining tools need to be exported as PMML documents and then converted to SQL using any of the PMML to SQL tools available on the web. One such tool is PMML2SQL that has been used in this example. The generated SQL codes can be used to run against data stored in an RDBMS table and this new data will be classified or clustered as per the model. This is known as scoring and is widely used when a data mining model is put into "production".

In this example we begin with the well known iris data set ( irisdata.csv )and import it into Rattle. Before doing so it would be a good idea to remove the '.' from the data fields as these tend to interfere with the SQL codes later on.

Then we create a Decision Tree model with this data

This data is then exported into a PMML file irisdata_rpart.xml. This XML file is then uploaded into pmml2sql and it generates the corresponding code

that we store in a txt file irisdata-rpart-sql.txt.

As a simple way to get started we use the SQLite3 database that is available through the SQLite Manager natively available inside the Mozilla browser and use the csv import facility to create a table irisdata


and then define the datatype in the different columns of the table irisdata

Once the table is ready, we copy the SQL command portion from irisdata-rpart-sql.txt file and execute the same


and see how the model has classified the data. In this case the training and production dataset is the same so (a) the accuracy is very high and (b) we see that the classification has happened correctly. The three kinds of iris flower have been placed in categories 1, 2, 3. This "scored data" can now be exported into irisdata-scored-rpart.csv for further use.

We now move to the heirarchical clustering model and again use Rattle to build the same


that we export as irisdata-hclust.xml which is then uploaded into pmml2sql and the corresponding SQL code is generated.

This is stored in irisdata-hclust-sql.txt for use against the actual data. However we note that the SQLite does not -- unfortunately -- support the mathematical functions like SQRT, POWER and we have to move to an fully SQL compliant RDBMS like MySQL.

The iris data already stored in the SQLite RDBMS is exported as an SQL file irisdata-createtable1.sql that contains both the table creation script as well as the data insertion script. A few cosmetic changes have to be made to make this script compatible with MySQL. Typically the quotes around the table name has be removed and the datatypes for each column have to specified more precisely. The changes are reflected in irisdata-createtable2.sql. This script is now executed in the GUI client (in this case SQLyog) of MySQL running on Windows 7.


and a new table irisdata is created


Now we pick up the SQL code available in irisdata-hclust-sql.txt and execute the same against the data available in the irisdata table.

Once again we see that the cluster scores, 1,2,3 reflect the original three kinds of iris flowers that were available in the iris dataset. This data can now be exported to a CSV file irisdata-scored-hclust.csv for further analysis. All the files referred to in this post are available for download in this zip file irisdataSQL.zip.

The beauty and value of this technique lies in the fact that your data mining tool need not have the ability to handle very large datasets. Any model built with a statistically reasonable amount of data with these tools can be "moved into production" and used against very large datasets that are stored in rugged reliable RDBMS systems. The only requirement is that column names in the RDBMS must be the same as the variable names used in building the datamodel. If this is not the case, then the SQL codes must be changed to reflect the different column names in the RDBMS table. 

In fact if the quantum of data is even higher than what a standard RDBMS can handle, then the data can be loaded in Hadoop / HIVE and then HQL ( a very close cousin of SQL) can be used along with the ultra high scalability of a Hadoop cluster to achieve the same result.

September 24, 2015

DIY IOT : Public Chat Servers to transport data over Internet

Continuing with my story of Yantrajaal and the Internet of Things ....

A key challenge in building the "Internet of Things" is to be able to connect a device to a computer over the internet and to use as simple and lightweight an infrastructure as possible. In this post we demonstrate how a public XMPP chat server can be used to transmit data and commands from one device to another using a chat client at one end and a python "bot" sitting on the other end. We will demonstrate the ability to INSERT data into an SQLite database, SELECT records from the same, play a variety of .wav files and execute any system commands on a "central" machine from any distant machine that supports an XMPP chat client.



Before starting on this exercise, we searched the web for prior activity in this area and we came across this webpage that suggests a similar approach for controlling devices over the internet, but the strategy explained here is simpler to code and implement.

We looked for a list of public XMPP chat servers and selected the Ad Astra service, perhaps because it was the first on the list! We next registered two userids, say x1@adastra.re and x2@adastra.re to be used on the two different machines using the web-registration form, because the in-band registration does not work!

We next installed the Pidgin chat client on two different Ubuntu machines and verified that text messages where being sent and received through the Ad Astra server without any problem.

Since working with two machines on a desktop is difficult, we configured our experimental setup as one Android phone acting as the distant machine and an Ubuntu 14.04 laptop acting as the central server. Commands transmitted from the Android phone using the Xabber chat app would be received on the server and acted upon.

For the server side we configured a chatting robot, or chatbot with Python using the xmppy library and using the code for the sample bot as a starting point. The sample bot program has a number of sample "handlers" that perform some simple tasks. These were modified to perform the following tasks
  • PUSH - to insert a piece of data into an SQLite database
  • PULL - to select a record from the SQLite database
  • SOUND - to play a selection of .wav files using the aplay command available in Ubuntu
These tasks were to be executed by calling a shell program and passing the relevant parameters through the python subprocess command.

There were three other handlers
  • EXESYS - to execute any system command on the central machine
  • TEST - to echo the command received back to the sender as a debugging measure
  • HELP - that lists all the possible commands.
Before the chatbot starts, we need to install SQLite, create a database and within it, an rdbms table. 

The chatbot ("chatIOT") is a python script that needs to be started with the userid@servername.com and the password as the parameters. It then listens for text messages and responds as appropriate to the six commands listed above.


The Xabber chat client on the Android phone is now used to send messages as follows



In the left hand image of a screenshot of the Xabber client on the Android phone, we see that the value PULLed from the database is 100.4. Then we PUSH 51.29 into the database and the subsequent PULL shows two values namely 100.4, 51.29 that were stored in the persistent database.

The second, right hand image, shows how the unix command ls for directory listing is executed and the data is returned. We note the three .wav files that could be played by sending the SOUND command along with a number that indicates which file is to be played.

How realistic is this setup in simulating a real IOT scenario ?

All that the distant device needs is a chat client. If it is an Arduino board then it is possible to create a chat client for the same as explained here and elsewhere. If the Arduino is connected to Raspberry Pi2, then life is even simpler because with python it is very easy to create a simple chat client - xmit.py

If one is not comfortable with messages being transmitted through a public chat server then one can build one's one chat server using the free and opensource OpenFire software and host it on a secure but publicly visible IP address. But even otherwise the usage of standard userid, password mechanism of a public chat server offers decent security.

What we have demonstrated in this exercise is how to build a secure data transport mechanism using publicly available components and with minimal programming. IOT developers and enthusiasts can focus on the behaviour of the edge devices without having to worry about the "plumbing" that will move the data from one machine to another.


The following files are available at Github :
  • the main python "bot" script, chatIOT.py and the shell script kmd.sh that is called by chatIOT.py to execute the SQLite and sound commands.
  • createIOTdata.sh and viewIOTdata.sh to create the SQLite tables and view the data
  • three .wav files for the three sounds
  • xmit.py - a lightweight xmpp chat client for sending text messages

For more posts and information about IOT, please visit  the IOT-HUB

September 16, 2015

W{h}ither Privacy ?

Privacy, or the Right to Privacy, has recently entered the public debate in India because the government has claimed in the Supreme Court that this right is not fundamental to the life and liberty that is otherwise guaranteed in the Constitution. But before we weigh in, in this debate, let us consider that even otherwise, how much privacy do modern, technology enabled individuals actually enjoy today?

All of us use Google directly - for search, for mail, for watching videos, possibly for cloud storage. We also use Google Maps when we use Uber, Zomato and similar location based apps on our Android phones that use its Global Positioning System (GPS) features. Facebook is our preferred way to keep in touch with family, friends and acquaintances with whom we share updates and pictures and respond to their updates with comments of our own. There are many other services that we use but let us stick to these two that are most ubiquitous and are immensely popular -- and of course, the icing on the cake is that they are free. Free as in “free beer” that is! Use it, have fun with it and no need to pay anything in return.

But is it really free? It obviously is, in the narrow monetary sense, but let us look beyond the obvious.

Do you know how much Google knows about you? Actually quite a lot. First, it knows what are the things or topics that you are searching for and second, as a corollary, the websites that you subsequently visit. So it knows if you have an interest in fine art photography or in pornography! Technically speaking, it does not know about you as such, but about the “anonymous” person sitting on your computer and using the browser but the moment you login to Gmail -- with your Gmail account -- it can immediately connect you, the Gmail user, to the person using the browser and your anonymity is blown for ever. Now this identification extends to each and every machine that you use to access Google services -- the laptop, the tablet and the smartphone and very soon Google has pretty good idea of the kind of person that you are. But this isn’t all -- because Google also reads  your email as well and knows whether you are planning a visit to the Andamans with your family or discussing your investment plans with your financial advisor.

But Google is not alone in this. Facebook is just as curious about you and possibly goes one step further in knowing about our friends and “relationships”. You begin with the minimal mandatory information about your name, email and date of birth but as you post updates, comments and visit website it gets to know you more and better. Powerful text and sentiment analysis tools determine whether you are, for example,  a right-wing computer programmer?  a left-wing machinery salesman? or an ISIS leaning university professor? Even if you are very careful about revealing details about yourself, it knows that “birds of a feather flock together” and so it checks out, not just your “friends”, but people who you have sent friend-requests to and the people whose friend-requests you have denied. Predictive statistics and machine learning techniques are used to connect the dots and arrive at conclusions that will surprise you with their accuracy -- this author was surprised to learn that Facebook knew that he lived in Bhowanipur, a locality in Calcutta, that he had never, ever consciously referred to in any communication. In fact, a website called Digital Shadow, a promo for a role playing computer game set in a dystopian, privacy-poor, Chicago and unfortunately not accessible outside the US, shows how much of  your private information can be extracted from your Facebook profile and used to create a dossier that looks suspiciously similar to one prepared by assassins! One of the most creepy features about Digital Shadow is that it tries to guess which of your “friends” in Facebook  could be used against you.

But why is Google and Facebook tracking, or rather stalking, its users? The obvious answer is that they want to offer you better services and in the process lock you into continuing to use their network. For example Google claims that it can customise your search results to make them more relevant for you -- a computer programmer and a zoologist both searching for “python” would be led to different websites, one for the programming language and the other for the reptile. Facebook claims that they can reunite you with people who really matter to you but with whom you have lost contact with since childhood. But the real motivation in tracking and knowing users is that this knowledge is used to show advertisements -- which is the only source of real income for both companies -- that are relevant for you. If the subject of the ads is of interest to you, you would be more likely to be tempted to click on them and bingo, that is when they will bill the advertiser for every click-through. The entire business model, and the humongous revenue stream, is critically dependent on knowing you well enough to being able to predict which ad you are likely to click on. Which is why Google and Facebook will go to any extent to extract information about you and to be able to do so they are willing to offer you more, better and more customised services to lure, and lock, you into their network.

In this context it may be a good idea to remember that whenever you get something free -- free as in beer, that is -- you are the product that is being sold to someone else for a price. Without being aware of the fact, you are monetising your personal information and using it as a “currency” to purchase the “free” services that are on offer.

But before you view Google and Facebook as evil ogres and rush to cancel your accounts and delete your profiles -- which you would eventually never do anyway :-) -- think again. Not only do these companies not have any evil designs on your health or wealth but they give you enough opportunities to stop them for acquiring this information. With a little bit of effort you can figure out what data about you is being captured and can then configure your accounts to allow only that information that you are comfortable with sharing. The other good thing about this data collection process is that the raw data is, in general, not made available to any third party unless there is a court order to make it available to an authorised government agency.

So in effect, Google and Facebook can collect your personal data but technically, you have a way to turn off the tap, or at least reduce the flow significantly. But can you really do so? That’s the catch! Consider the following ..

Google Maps is an amazing service, not only for taking flights of fancy over the Taj Mahal or The Great Wall of China, but also to navigate around town or go on a long drive through the countryside. For this, the GPS feature of your Android phone is used to determine your position and locate you on Google Maps. Unfortunately, your location is also stored by Google and the history of all places that you have ever visited in the past can be accessed by logging in with your Gmail account. In fact, even in places where this is no cellphone coverage, the location is captured and then uploaded into Google as soon as the cellphone can access the network. So even when you are off the grid, as the author was in Ladakh, recently, your location and approximate movements are available in your location history! That is quite a big hit on your privacy. You should also be aware that any picture that  you take with a phone or a modern digital camera is encoded with the GPS determined latitude and longitude of the place where the picture was taken and so even an indoor picture of any subject that is uploaded into Facebook carries with it the physical location of the photographer.

You may react to this in one of two ways. You may argue that since I am not a terrorist or not involved in anything illegal, I do not care if my location is known to Google and by extension to anyone who can access this data. But someone who is more privacy conscious may decide to use the facility that Google offers to turn off the process that determines location. The second option may seem to be good idea but this will lead to some major inconveniences!

You would not longer be able to use Google Maps to know where  you are. More importantly it will also disable a whole raft of location enabled applications like Google Sky, My Track, that helps you determine the length and duration of your morning jog, and horror of horrors the very useful taxi hailing apps like Uber and Ola. Actually all smartphone apps, gather and transmit a huge amount of personal information, including by not limited to location, behavioural patterns and even contact names. This is why most eCommerce sites like Myntra and now Flipkart are desperate to migrate users from websites, that need a browser, to the mobile that has a dedicated app. Of course you could keep toggling the location feature on and off as when you need it but that would be very inconvenient. So this is the tradeoff -- if you want the convenience of something useful, like booking tickets online or hailing cabs, then you must “pay” for it with your “personal data.” Are you willing?

Actually this tradeoff is not new. If you roll back a couple of years and go to an earlier generation of technology, the fact that you are using your cellphone means that you are revealing your location. Moreover a cellphone is a personal device and your call records -- that are available with the telephone company -- shows the network of people that you are connected with and this information is not really any different from the network of  your “friends” that Facebook is aware of. Similarly when you use a credit card then the bank gets to know both your location as well the nature of goods or services that you have purchased. All these are serious violations of privacy but can we afford to stop using cell phones and credit cards, without facing serious inconvenience?

But even this tradeoff is not new. Let us roll back a couple of centuries, or decades in some societies, and look at another kind of tradeoff. Would you want your womenfolk to step out of the house? For a lady to come out of purdah and visit a location outside the harem is also a violation of privacy! Outsiders may get to see what she looks like and also know who all she is visiting. Is that acceptable?

Actually it depends on where you draw the line and decide what level of privacy you are comfortable with. Unfortunately there is no unique and universal way to define the point where the tradeoff between privacy and convenience is not acceptable anymore. Someone may find it acceptable to keep their women behind closed doors, others may choose not to use credit cards and cellphones, still others may not want to use Google and Facebook and then there can be someone else who would refuse to give his biometric data to get the Aadhar card.

So we need to shift the debate from privacy per se, to the consequences of the loss of privacy. If a woman goes out of the house, then instead of questioning her loss of privacy, we first need to make sure that she feels safe enough to do so and then, should she be kidnapped, raped or otherwise violated, then the full force of the law should be applied on the perpetrator of the violation. So the real question is not that of privacy but of the rule of law.

Once the focus shifts from ensuring privacy to enforcing the rule of law then the issue becomes much simpler to address. There is no difficult tradeoff between privacy-vs-convenience or between privacy-vs-security. Instead we have the rule of law and that by definition should be absolutely supreme. In practical terms, this translates into making sure that before any service is offered in exchange of personal data there must be a clear and unambiguous contract, or terms of service, that specifies the end use precisely -- who will get to see the data and why. This practice, adopted by all ethical and honest websites that offer technology enabled services, should not only be adopted by the government but should also be enforced by laws of contract and tort through the judiciary.

But unlike Google and Facebook, government offers services for which it is the only, monopoly service provider -- eg, birth, death and marriage certificates -- and which the citizen cannot afford to live without. So there must be a by-pass, a manual override for individuals who prefer privacy over convenience. Well known privacy advocate Richard Stallman is one such man who does not use credit cards, cell phones, Google, Facebook or Amazon because he values privacy ahead of convenience. Most of the us, on the other hand, harassed as we are by the pressures of modern life, are more than happy to sacrifice our privacy for the convenience of modern technology enabled solutions. The government must cater to both categories of people.

As technology advances and systems become critically dependent on digitized information, prevailing ideas about privacy will become obsolete -- just like the harem and the purdah. But as privacy fades away we need something else - rules, processes and guarantees, that will ensure that the loss of privacy does not compromise the well being and dignity of the individual. Can we trust, and force, the government to be honest with its citizens? Not just with privacy but in every other area of governance. That is the real issue that civil society must grapple with.

this article first appeared in SwarajyaMag.

September 09, 2015

DIY IOT - Internet of Things or Yantrajaal

In June 1999, when this domain YANTRAJAAL was created I had envisioned and written that :
The World Wide Web has outgrown its initial concept of a mere network of computers and is being seen today more as a way of life. As it thrives and grows, this web is encompassing and pulling into itself a greater and greater diversity of devices – set top boxes, smart television sets, terrestrial and cellular phones, palmtops, network computers and very soon common household gadgets like refrigerators and microwave ovens. And along with these gadgets, our entire lifestyle is being dragged into this great web or Jaal.YantraJaal – is an eZine that reflects this new reality. In Sanskrit, “Yantra” is an artifice or a device and “Jaal” is the net or the web. YantraJaal thus, represents the web of connected devices.
 Today, the Internet of Things or IOT is finally becoming a reality but most of us are still grappling in the dark on how to move beyond the simple web servers and the world of blogs, wikis and social media. Here is how one can start taking baby steps into the world of IOT or Yantrajaal


One of the primary requirements for IOT - the "Internet of Things" is to collect data from remote devices over a TCP/IP internet connection and use the same for  analysis. Since the number of devices is expected to be very large -- far larger than the the number of IPv4 IP addresses that are possible, it is expected that IPv6 addressing will be used. However the sluggish rollout of IPv6 enabled network devices has necessitated the usage of intermediate "broker" services that allow the collection of data from devices and then publish the same for subsequent analytics.



This post will describe the architecture of such a system and provide sample codes that can be used to get a basic IOT Sensor Data collection.

Since the data will be travelling over the internet, the edge device must be a TCP/IP enabled computing device to which the sensor is connected. The simplest possible computer would be a Raspberry Pi2,  or similar machine, running on some lightweight flavour Unix or Linux. So in this case the sensor will be simulated by a small shell program that, in this case, emits two pieces of data (a) a sensor ID and (b) a numeric value, of temperature or pressure or voltage, that has been recorded by the sensor. With a real sensor device, this program will collect, or read it, from the sensor through a device driver program.

Once the data is available to a shell program ( we will call it ISD_pushData.sh ) it must be pushed into a central server and the easiest way to do it is using the curl command that is available in any Unix distribution. Where does one find such servers?

Companies like Carriots and GroveStream offer services that allows one to define a "device" that uses curl to send data in JSON or XML format to a datastream where it is stored for subsequent analysis. Carriots in fact offers a free service in which one can connect up to ten devices from which to collect, store and display data. Simple tutorials are available through which one can learn "How to send a stream using curl" and "How to create triggers" that will initiate actions based on the value of data that is received.

After working through these tutorials it becomes very evident that a similar service can be created on a Apache/MySQL/PhP platform that is widely available from any webhosting service like x10hosting or hostinger. The free versions are good enough for our purpose. This post in tweaking4all shows how this can be done and is forms the basis of this post.

What we need is (a) MySQL table to store the data (b) a shell program that will send data using curl to a destination URL on the web server and (c) a PhP program, available at the destination URL, that will accept data passed as parameters and insert it into the MySQL table.

The SQL command to create the MySQL table looks like this : ( Create_IOT_SensorData.sql)

CREATE TABLE `IOT_SensorData` (
`id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY COMMENT 'unique ID',
`event` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Event Date and Time',
`sensorID` VARCHAR(30 ) NOT NULL COMMENT 'Unique ID of the sensor',
`Value` NUMERIC( 4,1 ) NOT NULL COMMENT 'Value of data recorded'
);


This PhP program (ISD_pushData.php) that sits on the web server and accepts the data

<?php
    // Connect to MySQL
    include("dbConnect.php");

    $con = mysql_connect($MyHostname, $MyUsername, $MyPassword) or die(mysql_error());
    if (!$con)
     {
       die("Could not connect: 1" . mysql_error());
     }
     mysql_select_db($MyDBname) or die(mysql_error());

    // Prepare the SQL statement
    $SQL = "INSERT INTO IOT_SensorData (sensorID ,Value) VALUES ('".$_GET["sensorID"]."', '".$_GET["Value"]."')";  
    
    // Execute SQL statement
    mysql_query($SQL,$con);
    
    // Send Mail
    $mailMSG = $SQL;
    $mailDEST = "someone@somewhere.com";
    if ($_GET["Value"] > 50){
      $mailSUB = "Warning : Value HIGH";
      } else {
      $mailSUB = "Value Normal";
      }
    mail($mailDEST,$mailSUB,$mailMSG);

?>


The PhP program  on the web server accepts the data from the following shell (ISD_pushData.sh) program, running on the remote Linux machine. Instead of reading a value from the sensor, the program is generating a random number and sending it.

p1="http://prithwis.x10.bz/IOT/ISD_pushData.php?sensorID="
sensorID="1003B"
p2="&Value="
Value=$(shuf -i 1-80 -n 1)   # random number being generated for Value
URL=$p1$sensorID$p2$Value
echo $URL
# -------------------------------------------------------
curl $URL


In fact, the PhP push program on the server is not only inserting the data into the MySQL table but is also acting as a trigger by sending two different types of mail depending on the value of the data that is received from the remote sensor.

Once the data is available in the MySQL table, it can be displayed (using ISD_viewData.php) as

In fact, one can run the shell script on any machine and the new value will appear when this pages is refreshed!

It is also possible to visualize the data graphically (using ISD_graphData.php)

All the codes used in creating this post are available on Github. The graph has been created with free tools available from JpGraph and the specific graph shown here is based on the sample shown here.

In this post we have demonstrated how data lying on remote Linux machine ( that is possibly connected to a physical sensor ) can be pushed into webserver and subsequently used for data analytics.

September 01, 2015

The Social Media route to Digital India

Can Digital India be launched on the back of a private social media network ?

E-Governance is a concept that has fascinated bureaucrats, academics and the IT industry in India for many years but the term is ill defined. At one end of the spectrum we have government departments that put up web portals with static, mostly obsolete, data while at the other end we have useful applications for, say, passports and income tax. This great disparity in sophistication and utility is because each such application is the result of an independent initiative and reflects the vision of the owner and the competence of the vendor who was awarded the tender on the L1 ( lowest cost) basis. In the corporate sector this is referred to as the “Thousand Island” scenario  -- with “islands” of automation separated by gaps of inconsistent data -- and the common solution that is offered is based on an “ERP” like SAP that can tie together all parts of the organisation with one coherent software. Unfortunately there is no such universal platform for government requirements but now with a new push for Digital India, we could perhaps envisage something similar -- based on the generic idea of social media networks like Facebook.


Why social media ? Because this is a platform that is very simple and convenient for people to use and relate to  -- no one has to be taught how to use Facebook or Linkedin! There is something intuitive about “friendships” and “group membership” that anyone can relate to and participate in through status updates, comments, “likes”. So if we create a private social media network, let us call it publiNc -- a Facebook look-alike, it should be fairly easy for government employees to join and use.

But use it for what ?

Since a large part of the government work consists of collection and dissemination of information, a social media platform, with its collection of blogs, forums, posts, comments, replies,  document attachment and messaging facilities, could be a simple starting point. Currently employees of the government operate on a diverse set of platforms but now in publiNc they will have integrated messaging, email, chat, and VoIP voice -- all available on desktops as well as smartphones. Individual departments (and sub-departmental groupings) can be reflected as publiNc communities, with their own “pages” or “groups” some of which could be closed and private while others could be open with some degree of moderation. Departmental administrators who currently arrange for tea, tables, cubicles, cars etc., can trained to be “group admins” responsible for moderation. By facilitating all communication within the government, publiNc will be the first step towards Digital India.

But there is more to government than sharing information. Information has to be recorded, processed and displayed on demand -- through database driven transactional applications. On social media we see this in the form of thousands of games or apps -- independently developed software like Candy Crush, FarmVille etc.  -- that sit on top of and provide diversity to the social media experience. So would it be on publiNc. Individual departments, or “communities”, can have their own software developed as an “app” and users who need this functionality can add it to their profiles. publiNc apps built for one department and found to be useful can also be used by similar departments operating in other parts of the country. Since these “apps” are generally hosted on independent hardware, issues related to security and privacy can also be addressed.

Another key requirement in government is approvals, along with audit trails. This is a straightforward workflow application where an electronic document moves through a series of predefined named individuals who must take some action, that signifies approval, before it moves to the next person. In social media terms, this means that after a user has taken some action with a publiNc app, the app must automatically post an update on the wall of the next person along with a link that will lead him back to the same app. This is no different from tagging friends on a post or Facebook games that automatically invite friends to come and play the same game. This is a great irritation on Facebook but is perfectly suited for our publiNc workflow since it sets an automatic reminder for the next person to take action.

Finally, privacy settings for individuals, communities and apps will ensure that only the right people will know who is doing, or not doing, what and for how long.

The next step would be to allow private citizens to open accounts in publiNc so that they too can communicate with government employees. But before being activated, all such accounts should be verified against some government document like PAN card, passport or Voter ID card and a KYC-compliant mobile number to establish the authenticity of the user. With such an account citizens should be able to communicate with individual government employees or with the “pages” created by departments for such purposes. Questions posted on such pages can be answered by designated employee and, depending on privacy settings, can be viewed by many. Similarly, citizens can be allowed to install and use certain apps that allows one to send applications, pay fees and taxes or do anything else that would otherwise need a visit to a government office and wait in a queue in front of counter. Responses from the government department can be posted on searchable forum posts or sent to private inboxes.

So now we have a complete digital ecosystem that could link all parts of the government with each other and, potentially, with each and every citizen of this country. Normally, such a gigantic system would take years to build and need thousands of computers to run on but the beauty of social media is that it is intensely scalable. When Zuckerberg created Facebook in 2004 it was a tiny system meant for a few students of Harvard University. So can it be with publiNc. We can begin with a small system with a few apps that caters to the needs of a couple of departments at the Centre and a few of the more dynamic States. Then just like Facebook, publiNc should attract people, both government employees and citizens and add more and more e-governance apps until it can become a behemoth that Facebook is today -- spanning the whole world and having more than 1 billion users -- the same as India’s population today.

The technology for building a social media network like publiNc is very much available and that too much of it is in the public domain. Issues related to scalability, stability and security are all known and easily addressable within reasonable cost and time. While 80% of India’s population may not have access to the internet, the other 200+ million who have access constitute a large enough pool worth catering to and with the proliferation of smartphones the number of publiNc users will grow by leaps and bounds. So there is no doubt that the system can be built very easily and can be readily used both by government employees and by citizens

But the real challenge lies elsewhere. Will our government employees want to use such a system ?

Can our politicians and bureaucrats tolerate the intense transparency that a system like publiNc will bring to the country? While privacy settings may keep many of the actions and decisions out of the public view, all actions taken -- or not taken -- will be automatically noted, recorded, stored and may be retrieved by duly authorised agencies. This can a serious blow to the discretionary powers enjoyed, and abused, by government employees. Hence the biggest challenge to publiNc would come from the entrenched vested interests who would be reluctant to allow the system to work. There is a clear and present danger of publiNc being sabotaged from within.

But this is not a problem restricted to publiNc alone.It is something that can plague any e-Governance system that is created to support the vision of Digital India. So there can be no technology based approach that will facilitate or expedite the adoption of a e-Governance applications -- we need a “social” approach or one driven by peer-pressure, or the need to conform to the environment.

This is where social media based software is so different from the typical IT systems that we use in banks, shops, government agencies and other commercial environments. No one needs to force anyone to join and use networks like Facebook or Linkedin. People not only find these systems useful and easy to use but what is most important is that without it, they feel that they are being left behind by their peers. Hence the urgent need to join and catch up. So if a few pioneering departments can adopt this social media approach, most of its employees would willy-nilly join in and then other departments can join in one after the other. Adoption can be expedited however, with a few useful and universal apps like leave application, provident fund or new pension scheme updates and so on.

Finally, one can ask if such a social media based approach to e-Governance has been tried in any other country. Possibly not, but for a change, let Digital India show the world a new way of doing things, instead of forever copying what is happening elsewhere ?
-----------------------------------------------------------------------------------
This article originally appeared in the August Issue of Swarajya, the magazine that reads India right!