October 23, 2016

Pokémon Go and the Ghostly Metaphor

Pokémon Go is a relatively new computer game that has, since July 2016, taken the world by storm. As far as its rules go, it is not really different from computer games that have been around for the last twenty years -- the player goes around locating and collecting objects of interest to earn game points. But the real impact is the introduction of an all new level of technology whose potential is yet to be understood by most of us. Unlike every other computer game that you can play from the comfort of a desk or a couch, Pokémon Go needs you to walk around the neighbourhood with your smartphone and “catch Pokémons”. The catch here is that the game merges the virtual reality of Pokémons with the physical reality of the neighbourhood Google Map. Since the game is aware of your location, you need to walk down actual roads, turn past, or enter, actual buildings and then, and only then, will you “see” the Pokémon in your smartphone. If you turn on the smartphone camera, the game cleverly superimposes the hypothetical Pokémon that you want to catch on the actual image that the camera is showing so that it seems as if the Pokémon is really in the room or on the road. This is augmented reality -- where the “virtual” is superimposed on the “real” in a way that makes it difficult to distinguish one from the other.

image from gizmodo

Augmented reality is not new. Digital data has been superimposed on physical images in heads-up displays used by fighter aircraft pilots. But this is the first time that this technology has come down to the mass usage level through an inexpensive smartphone. Pokémon Go has the potential to be what the spreadsheet was to computers in general and the Mosaic browser was to the world wide web -- a path breaking “killer application” that suddenly leads to a sudden jump in the usage of a particular technology. Other than the obvious benefit of forcing players to actually exercise their bodies, Pokémon Go has opened up new opportunities for doing business. For example, physical locations like shopping malls and restaurants can be quickly populated by virtual Pokémons to lure footfalls and customers. Again virtual characters can “appear” in thinly populated physical locations during an emergency to guide people towards areas of safety.

As a game, Pokémon Go may be a bit of an anticlimax and a smartphone camera is not a very sophisticated display device. But together, they show us the potential of what is possible when we have a convergence between the real and the virtual. Massively Multiplayer Online Role Playing Games (MMORPG) like World of Warcraft and Final Fantasy create fantastic virtual worlds where thousands of players “play” and interact with each other while executing quests that lead to goals and rewards. Players, represented by avatars in these games, move through computer generated landscapes, communicate, trade or fight with other avatars to acquire benefits. Earlier these games used to be played on regular computer screens with keyboards and joysticks. But now we have virtual reality (VR) headsets ranging from the inexpensive Google Cardboard that works with any Android phone, through more sophisticated products like Microsoft Hololens, Oculus Rift,  to dedicated gaming devices like the Sony Playstation VR. Using a range of physical sensors, these devices track the movement and position of the user’s body and change the visual perspective so realistically that the mind is fooled into believing that the events are happening in real life and not in the display of a computer.

Going forward in the direction pioneered by Pokémon Go, all these games can and might superimpose the real physical landscape, and people, into the virtual, computer-generated landscape -- or perhaps the other way around. At the Developer Forum in August 2016, Intel unveiled Alloy, a standalone VR device, that if placed on the head will completely isolate the wearer from the physical world and instead, immerse him in the virtual. In parallel,  Microsoft is working with Intel to create the capability of generating three dimensional holograms that will be visible in the physical world using Hololens and Windows 10.  This would mean for example, a player with a VR headset will meet and interact with a virtual character in a real room while other real people in the room will see nothing since they do not have the VR headset. An augmented reality movie will allow a user to see the actors running around the locations set in publicly accessible physical location like the Taj Mahal. The possibilities are endless.

All this is possible with technology that is already available today. But let us now take a leap of imagination and see what could be happening next. Today’s virtual reality devices are big clunky devices, somewhat like binoculars or helmets. They fit in front of the eye or on the head to eliminate the physical world and replace it instead with the virtual world. In augmented reality, the real world will also be allowed to creep into the field of view and will be merged seamlessly with the virtual so that as we discussed earlier, virtual objects will appear in a real setting and real objects will appear in a virtual setting. This is where the mind will start getting confused -- what is real and what is virtual? At the moment this question can be resolved very easily if the person simply takes off the headset. Immediately the virtual will disappear and only the real will be left behind.

But what if he cannot or does not wish to remove the headset?

What if the headset is reduced to the size of a normal pair of spectacles, as in Google Glass, or even smaller to the size of contact lens that is implanted on the cornea of the eye? For such a person, the dividing line between the real and the virtual will disappear completely and he will never know what is real and what is virtual. This is very similar to driving a car whose windshield, and windows, are replaced with a computer screen that shows the “outer world”. Initially the outer world will be real world streets, buildings and cars as captured in realtime by cameras but then the screens can display a desert or jungle and the driver would think that he was indeed driving in some other terrain. In fact the illusion would hold even if  the car did not move but only the images change. But what would be most interesting, or confusing for the driver, would be if we could augment the physical reality of streets & buildings with virtual images of non-existent cars that are whizzing past. With sounds and vibrations being built in, reality will become a confusing continuum stretching from the physical world through the augmented world and then to the virtual world. No one will know where one ends and the other begins.

Spectacles, contact lenses and in-ear hearing aids have already become an intrinsic extension of our bodies and act as an intermediary between our brain and the external world of sensory data. What if these were to become so ubiquitous and smart so that we forget that they exist and are modulating the data that our brain processes to understand the “reality” of the external world?

Would we still know what is physically real and what is virtually real? Which leads us to the next, bigger question -- what indeed is “really” real?

Sankara’s Advaita Vedanta posits that nothing is real, the world is an illusion, an error that we make in our perception -- no different from the virtual image that we might see through a Microsoft Hololens! Some of us might baulk at this repudiation of objective, tangible reality and insist on evidence of the existence of alternate worlds. However, as argued by the author in “Logic to  Magic” in the September 2016 issue of Swarajya, the world of “science” itself is swinging around to look at additional dimensions where other worlds may exist. So there is nothing wrong in exploring worlds that may lie beyond the only one that we are familiar with.

Once we accept that real and virtual worlds might exist with each other and the human mind may not be able to delineate where one ends and the other begins ghosts, spirits and other astral bodies may suddenly become less difficult to accept. After all, these non-physical, non-”real” entities are no different from the Pokémons that the game software inserts into the camera,  between us and our perception of physical reality. In the case of the Pokémon Go game, we know that there is a software and a camera in the smartphone but in the case of ghosts we are currently clueless.

What if that software had an analogue in the DNA sequence in our genome that affects our ability to perceive?  What if the camera had been a “contact lens” implanted in our eye at birth? Would we ever know that the Pokémon, or the ghost, doesn’t “really” exist?

This article was originally published in Swarajya, the magazine that reads India right

September 27, 2016

Cryptocurrency at Cypher2016

I was invited to Cypher 2016 - India's biggest Data Science conference organised by Analytics India Magazine at Bangalore where I spoke on Bitcoins, Blockchains and Cryptocurrency.

Here is the slidedeck ( please see in full screen )

and here are the videos -- -- -- --

September 21, 2016

From Logic to Magic -- In Search of the Real

Better standards of living, that many parts of the world enjoy, are often traced back to the renaissance in Europe that freed man from blind bondage to belief and allowed him to fly free on the wings of rational inquiry. These standards of course are defined in terms of material comfort -- food, clothing, shelter, safety and finally the leisure to explore the arts and the sciences. This leads to technical and administrative competence and the emergence of good governance that in turn, loops back to create even higher standards of living. While every society desires this virtuous cycle, those that have aggressively adopted a scientific approach were the ones that have been successful in overcoming or converting others to their point of view. Spiralling out of Europe and reaching out into the depths of America, Africa, Asia and Australia, it has been the triumph of the rational way -- based on facts, axioms, logic and reason -- that delivers material comfort to the population.

borrowed from
But is there an alternative? A narrative that seeks to look past the last 500 years of rational science and instead, perceives the universe through intuition and imagination? The Sanskrit word for philosophy is darshan, the sight, the perception of the truth as seen through the mind’s eye of the Vedic seer! But any attempt to subscribe to such a Vedantic vision of the world is immediately criticised as being an anti-scientific, irrational regression into saffron stupidity. Where is the proof? So first, let us get that out of the way  …

Kurt Gödel was a mathematician, a colleague, confidante and contemporary of Albert Einstein at the Institute of Advanced Studies at Princeton. He is remembered for his famous Theory of Incompleteness that showed that any collection of consistent statements will have at least one statement that is true but not provable. Thus provability is a weaker notion than truth. Gödel’s Theorem knocks out the philosophical foundations of the edifice modern mathematics, carefully crafted by Euclid with his axioms and proofs. Gödel showed how his theorem was applicable to mathematics in general and arithmetic in particular. This implies that if there are statements that are true but not provable even in a science as well structured as arithmetic, then there should be no difficulty in accepting the same in more complex, subjective philosophical systems. Lack of proof is no more an excuse to deny the truth of what is otherwise intuitively obvious!

Now that the need for a proof is out of the way, let us explore some interesting ideas ..

Sankaracharya,  who churned out the philosophy of Advaita Vedanta from the ocean of the Vedas and the Upanishads and created the most popular philosophical basis of Hinduism, states that the physical world is an illusion -- Brahma satyam jagat mithya, jivo brahmaiva naparah. Prima facie, this sounds absurd. How can the world that I see -- and touch, feel, experience -- around me be not real? Even if we ignore our senses, we have equipment in our laboratories that can record enough evidence from deep inside atoms to the outer edges of the galaxy and the universe.

But consider virtual worlds. Of the kind described in the movie Matrix or experienced in online games like World of Warcraft  and now being rendered through virtual reality devices like Oculus Rift, Samsung Gear and Microsoft Hololens? Technology can blur the boundary between the real and the virtual but we may still claim the satisfaction of knowing that, in principle and in theory, it is possible to distinguish one from the other. But this satisfaction is short lived. Nick Bostrom, of Oxford University, in a paper published in 2003 puts forward the simulation hypothesis that argues quite convincingly that it is not impossible that the world that we inhabit is indeed a simulation (or “Sim”) that is being run on a digital computer in the multiverse. So the physical universe that we know it today becomes one of the many “Sims” in the multiverse operating at a higher dimension or plane of existence. Recently, Elon Musk has echoed a similar thought.

What exactly are these higher planes and dimensions? The normal reality that we are familiar with admits of three dimensions in space to which Einstein added a fourth dimension in time. String Theory, a descendent from the hoary lineage of Relativity, Quantum Mechanics and the Standard Model of particle physics has made it quite respectable to consider the universe to have 10 or even 26 dimensions that are curled up, or crushed into the four that we know. A 3D structure can be reduced to a 2D photograph (for the engineer, the  plan and elevation!) that in turn can be rolled up into a thin, 1D tube. Information is lost when dimensions are reduced and may be recovered when the original dimensions are unrolled. Edwin Abott’s 1884 novella, Flatland, a satire on 19th century social issues that spanned across 1D, 2D & 3D worlds, was one of the first modern texts that explored the mathematical novelty that crops up in multi-dimensional String Theory today.

Unlike Relativity and Quantum Mechanics, String Theory is not yet proven and it is unlikely to be, in the near future but that does not make it untouchable in academic circles. So is the case with the simulation hypothesis that opens the doors to the multiverse. The illusory world of Maya that Sankara posits is indeed no different either.

Sankara talks about the primacy of the Brahman, the primordial, conscious sentience that is the only reality of the universe. Sentience and its close cousin intelligence is a function of information exchange and information science can play an interesting role in exploring this area. For example, life as we know it is a manifestation of the information stored in the genetic code. The medium of storage, the DNA molecule, is physical and degradable but the personality, the spirit, the Atman, that is encoded in the gene is transcendental, immortal and transferrable. You can destroy a paper book but not forget the classic story that was written on it! The story is independent of the physical book. From this perspective, the idea of an immortal Atman that evolves across multiple physical incarnations until it achieves identity with the Brahman certainly sounds feasible, much to the chagrin of the dyed-in-the-wool rationalist.

That information is the key to a fundamental understanding of the real world is a hot new topic in current physics. According to the MIT Technology Review “Some physicists are convinced that the properties of information do not come from the behaviour of information carriers such as photons and electrons but the other way round. They think that information itself is the ghostly bedrock on which our universe is built”. Based on the work of Erik Verlinde, of the University of Amsterdam, who showed that the Laws of Gravity can be derived from the Laws of Thermodynamics, Jae-Weon Lee of Jungwon University, South Korea has shown how gravity can be related to quantum information. Of course, the information that they talk about is not the kind stored in books and  computer disks but are defined in terms of symbols, sequences, probabilities and eventually entropy.

Information plays a key role in the description of both the cognizant intelligence that lies at the heart of reality as well as the physical depiction of this reality. Information is both the spirit as well as the body that is temporarily attached to it. Access to this information is the key to experience sat-chit-anand, the real and conscious bliss, that pervades and defines existence. Ekam Satya, vipra bahuda vadanti - Truth is one but many are the paths to it. Traditional science with its emphasis experimental rigour and rationality is certainly a useful tool but the direct experience, born of meditation and leading to enlightenment is an equally viable way to reach the same goal.

The Nasadiya Sukta of the Rig Veda 10:129 asks
But, after all, who knows, and who can say
Whence it all came, and how creation happened?

To which the classic textbook on Physics by Resnick and Halliday answers with a quote from the English poet W.B.Yeats saying that “the world is full of magical things patiently waiting for our wits to grow sharper”. Implicit in this statement is the fundamental premise of science that the world is understandable. The alternate premise is that the world is experienceable. Before the Mahabharata war, Krishna is seen trying to convert Arjun to his point of view. But after the first ten chapters of the Bhagavad Gita, Krishna realises that his logic has failed. Then he invokes the magic of a direct experience of the Divine, the Vishwaroop Darshan, in chapter eleven to show Arjun the reality and convince him to pick up his weapons again.

Logic and magic, reason and intuition, are two sides of the same coin that buys a ticket for the train that runs from darkness to light, from the illusion to the real.
this post was originally published in Swarajya, the magazine that reads India Right

August 22, 2016

Two Cheers, Not Three for Economic Liberalisation

1989 was a watershed year for both the world in general and me in particular.

I had just finished my PhD from the University of Texas at Dallas and had decided to break the jinx of the X+1 syndrome and return to India. Those who have been a part of the desi community in the US in the last century would recollect this strange yearning of those who had finally arrived in the US, not just physically, but metaphorically as well, to give it all up and return to India. Nostalgia for home, sprinkled with a sense of guilt for having abandoned it, competed with la dolce vita, the good life, that America held out to the F-1 visa community of graduate students and it was always that the good life that won out. Most of F1 crowd would eventually get the Green Card, permanent immigrant status, and then become US citizens but they would always keep alive the delusion that next year, X+1, they will wind it all up and move back to India. It was a delusion because India was still stuck in socialist quicksand, where the cost of a new car was twenty five times the monthly salary of a fresh IIT B.Tech, while the corresponding factor in the US was three or four! Did I feel a turn in the wind? Did I suspect that things in India could change for the better? Perhaps I did or perhaps I was just foolish, but armed with a large hearted offer from Tata Steel I decided to pack up and return.

On the way back, my wife and I decided to use the $2200 windfall that I had just got by selling my Mazda 626 car to buy two tickets for a 15 day tour through Europe with the travel company Globus Gateway. Europe of course meant western Europe because the Iron Curtain of communism ensured that eastern borders could not be crossed very easily. Even within western Europe, we had to obtain seven separate visas for the countries that we would pass through. Nevertheless we eventually arrived at Paris on Bastille Day to realise that the world was celebrating 200 years of the the French Revolution. But little did we know that three months later, while we would safely be in Jamshedpur by then, the world would see the spectacular fall of something that is closer to us in history -- the Berlin Wall.

The aftershocks of the fall of the Berlin Wall reverberated throughout the world and in a way led to the fall of India’s Soviet-era socialist economic model in 1991. Indians finally had the chance to participate in the global economy and today, the 2015 IIT graduate with his Rs 15 lakh placement package can finally think of a new car with only four months of his salary -- just as it was in the US in 1989! Some may of course wonder whether a new car is all that important for a fresh graduate but that is another question that can be debated elsewhere.

This summer, my wife and I were back in Europe, with our son, and with no Iron Curtain in the way, we decided to go through the great cities of Eastern Europe. Did I see anything different? Not really. As a tourist you visit palaces and churches, ride trams, take cruises and eat, drink in pubs and bars that have not really changed over the years. But the real change  that I felt was in me -- and by extension, in other Indians. This was a direct outcome of the economic reforms that were kicked off in 1991 by the beleaguered Government of India in a desperate attempt to stave off the socialism inspired bankruptcy.

So what were these changes that I felt ?

First was economic freedom. I had grown up in a upper middle class family in Calcutta, studied in a renowned school and were financially well off but my father could never dream of a family vacation in Europe! That was for “big businessmen like Tata, Birla”. This has changed. The emergent middle class in India can now think big as well, not just in terms of vacations but in most of the good things in life. No longer do we wait for our relatives to come back from foreign lands and hand out shampoo and soap!

What is more important is that our currency is recognised internationally. Before 1991 the rupee was worthless outside India. Getting “foreign exchange” for even the most mundane and legitimate purposes, like paying for the application fees for a US university, was a titanic struggle with forms to be filled in triplicate. Any foreign exchange in cash or cash equivalent travellers cheques had to entered in the passport for subsequent scrutiny by vulturesque customs officers. Given the restrictions on getting foreign exchange and the meagre amounts that could be obtained -- unless of course you had the right connections -- travelling abroad was difficult. You had to think thrice before eating out at anything more expensive than a McDonald’s restaurant. But now our own Indian credit cards, issued by our own Indian banks are readily accepted anywhere around the world and this was a very pleasant surprise for me. Conditioned as I have been to moving around with limited amounts of dollars, and keeping track of every cent that I was spending, the fact that I could access an ATM and withdraw euros, zlotys, forints and karunas directly from my rupee savings bank account in India was something that took me quite some time to get accustomed to.

The next big change is in telecommunications. I had grown up in an India where a phone was a luxury and one had to wait for years to get a connection. STD was unknown and trunk calls -- with variations like lightning calls and person-to-person calls, were hideously expensive. Long distance calls within US were quite reasonable but calling India from the US was frightful and one had to pay nearly US$ 5/ min for calls and that too when it was night in India. The first time I saw a fax -- it used to be called ZapMail -- was when the Swiss embassy suddenly wanted a copy of our air ticket before issuing a visa.  Back in India, calling up my mother in Calcutta from Jamshedpur involved visiting a post office, standing in queue and paying Rs 100 in advance before the call could even be attempted. Mind you, it was attempted, not guaranteed to connect! Outside cities, telephones were impossible. I remember the wedding of a friend of mine where we accompanied the groom from Jamshedpur to Bokaro via Purulia and when faced with a sudden emergency, we could not make any kind of call until we had actually reached our destination.

This of course has changed beyond recognition. First, thanks to private players and wireless technology, getting a phone in India is just one KYC compliance away. Then you have VoIP technology like WhatsApp and Google Voice and when this is coupled to free WiFi services available at each and every hotel and restaurant in Europe, we were in constant touch with friends and family at virtually no cost.

Consumer goods, currency controls and communications -- ever since the heady days since 1991, all this changed for the better in India, but what has not? Many things, including our attitude towards corruption and criminals in public life but perhaps what is most obvious is India’s travel and transport infrastructure. While private airlines and app-based cabs cater to the requirements of the well heeled traveller, the common man is still at the mercy of inadequate and overcrowded public transport systems. In my current visit to Europe nothing showed this up more than the usage of trams in inner city transport.

Calcutta has a history of trams going back to 1902 and has the oldest running electric trams in Asia. But thanks to a combination of unfortunate incidents, including but not limited to the destruction of a large number of rolling stock by the communists in the 1960s, the tram system is gasping for breath. Unimaginative planning, incompetent operations, venal politics and inevitable corruption has come together to destroy an elegant, inexpensive and non-polluting form of transportation. As a big fan of trams in Calcutta, I have often been told that trams are obsolete and are an anachronism in a modern city. But this year in city after city, in Berlin, Warsaw, Krakow, Brno, Budapest, Vienna and Prague we saw how modern and sophisticated trams have been integrated with buses and even river boats to create an affordable and efficient public transport system. Why can we not build the roads and railways that this country so desperately needs?

What is lacking in India is neither technology nor capital but the ability, or perhaps the willingness, to put things together and craft an elegant solution that addresses basic infrastructural requirements. The economic reforms of 1991 may have uncorked the bottle of stifling socialism and released the genie but the genie is yet to master the magic that will create the right management structures not only for transportation but also for schools, hospitals, municipalities, courts of law, law enforcement, tax collection and in fact for the entire infrastructure of governance and public services.

The reforms of 1991 might have vindicated my 1989 decision to return to India because in purely economic terms, India today offers opportunities to achieve and maintain a standard of living that is comparable to what was possible in the United States. But what the reforms have left incomplete is the corresponding changes in governance procedures. With people and their mindset remaining the same, the only way to upgrade this infrastructure of governance is perhaps to reduce the discretionary role of humans and move over to a more systems driven approach to governance. As argued by this author in the May 2015 issue of this magazine, we need to leverage technology and modern management techniques to the hilt and use them to overcome deficiencies caused by people. Unless this happens and it happens very quickly, future generations of Indians will once again think of India as not a good place to go, or even return, to but perhaps just a great place to have come from.

And till then, it is only two cheers for economic liberalisation!
This article was originally published in Swarajyamag - the Magazine that reads India right

July 21, 2016

The Second Book on the Third Wave

Steve Case is such a big fan of Alvin Toffler’s 1980 classic, The Third Wave, that when he pens his own memoirs he gives it the same title. In his seminal work, Toffler had identified three distinct waves in the evolution of human society as the world moved from agriculture, through industry to become a post-industrial information driven society. Steve divides Toffler’s third wave -- the information phase -- into three sub-waves and then examines the third of this third in greater detail.

In addition to being his memoirs, that chronicle the rise and fall of America Online, the company that really got Americans hooked to the internet, there are two other distinct themes that Steve has woven into this easy to read book. First he wants to be mentor and cheerleader for the entrepreneur who has an idea to change the world and does not know how to go about it. The second, and this is pet theme, is the distinction between the first, second and third waves, or sub-waves, of the internet driven economy that dominates the world today.

For Steve, the first wave, in which he and his company AOL played a very significant role, was all about the setting up of the infrastructure of the internet and world wide web. This wave collapsed in the dot com bust and was followed by the second wave of applications -- Google for search, Facebook  for social media, WhatsApp for communications, Amazon for commerce. The key difference between the first and the second wave was that the second was driven by individuals, or small groups, using cutting edge technology while the first wave was not so much about innovative technology as about clever collaborations and partnerships. Steve admits it as such when he says that “AOL was not alone in believing in the idea of the Internet but we outhustled and outexecuted our competitors. The big companies like IBM and GE, should have prevailed, but they didn’t. Their lack of agility and entrepreneurial passion and culture hobbled them.”

The third wave will finally see the internet delivering on its promise of universal connectivity that it’s evangelists have been talking about since it’s early days. IoT -- the Internet of Things -- will connect every device from the car, to the toaster, the smartphone to the refrigerator, the powerplant to the electric switch through the internet and deliver innovative, useful services seamlessly. Steve believes that this connectivity will be so ubiquitous that the phrase internet enabled device will be as irrelevant as, say, an electricity enabled washing machine. Being connected to the internet will be the default and not a novelty or a USP. This will also ensure that third wave companies, and applications, will not create new or unusual business opportunities but will streamline and make more efficient, existing mainstream businesses like healthcare, education and agriculture that form the backbone of the global economy today.

Steve believes that the key and crucial differentiator for the third wave companies will be, like the first wave once again, partnerships. Unlike Elon Musk or Jeff Bezos, Steve is no champion for new or groundbreaking technology. Instead he believes that the success of the third wave entrepreneur will lie in stitching together a network of alliances and partnerships across three kinds of entities, namely, technology creators, mainstream businesses and government agencies. Knowwho will take precedence over knowhow. Unlike most technophiles, Steve believes that government can and must be trusted and, however difficult it may be, the entrepreneur must walk that extra mile to take government along if he wants to succeed. Entrepreneurs in India would, I am sure, wholeheartedly go along with this sentiment since they know very well that in India managing the government is more important than technology or management systems.

As an extension of his recurring belief in the value of partnerships --  “If you want to go quickly, go alone, if you want to go far, go together” -- Steve has a team of professionals from West Wing Writers to sieve through his speeches, distill out the wisdom and package it into this nice book. But as in anything created by a committee, the result, while being faultless to a point, lacks the brilliance of original ideas or the elegance of literary craftsmanship! Entrepreneurs however will see in Steve an image of their days of struggle, learn about the importance of networking and partnering with government and be motivated to jump into the third wave of the digital society that is already cresting around us today.

Originally published in the August 2016 issue of Swarajya, the magazine that reads India right!

June 30, 2016

Build your website at the lowest cost

This blog post will show you how to create a fairly decent website at a guaranteed lowest cost and that too without writing any code. Take a look. This post was originally written for iot-hub and the approach is currently being used at Yantrajaal as well. However, when Yantrajaal was created in 1999, none of these technologies existed and I had to take a more expensive route, that you do not need today.

The first step to creating your, or  your company's, digital identity is to build a website. Most people begin by purchasing web hosting services either from a web hosting company or from a value added reseller and have them build their own website. While this may be fine, a do-it-yourself approach will get you going at the minimum possible cost. This post will tell you how you can do this.

1. Purchase a domain name, from a domain registrar like TierraNet or any other similar company. This will cost you around US$ 14 / year. You can get an absolutely free domain name from Freenom but these domains will  end with .tk, .ml, .ga, .cf, .gq and not with the usual .com, .net, .org etc. Irrespective of where you purchase your domain from make sure that you have complete access to configure or modify the DNS records corresponding  to your domain, preferably through a GUI interface.

2. For hosting your website you have two options (a) Get a traditional web server from a hosting company like x10hosting, that could be free or have a monthly charge. Make sure that you have access to the CPanel application to manage your website. (b) The other option is to use Google's blogger platform. Unless  you want to build a transactional website with PhP-MySQL ( or equivalent ) support, the blogger platform is far easier to work with and is an excellent starting point. The blogger option is strongly recommended.

This post assumes that  you have chosen the blogger option.

3. Create a blog by following instructions given in this tutorial. For the name of the blog, choose the same character string as you have for the domain. If your domain is xyz.com then your blog should be xyz.blogspot.com. This is not essential but is a nice to have feature.

4. A blog looks different from a traditional website because unlike the latter it does not have a fixed home page nor does it have a set of navigational tabs across the top. To get over this problem, follow instructions given in this post.

5. Now  you need to link your domain xyx.com to your blog xyz.blogspot.com. For this you need to login into your domain registrar account (created in step 1) and then navigate to the screen that allows  you to manage the DNS. There will be DNS records that would need to be added, modified. To do so, you need to go to this Google Support Site and follow instructions there. Remember, you are modifying the top level domain, that is www.xyz.com and not a subdomain like foo.xyz.com and choose the appropriate instructions. [New] The Google Support Site is good for top level domain like www.xyz.com. However if you already have a site like www.xyz.com and you wish to create a subdomain like icecream.xyz.com, then you should follow the simpler instructions at smallbusiness.chron.com.

The DNS records that you add may have a conflict with existing DNS records that the registrar would have provided by default ( that points visitors to a default, under construction website). If in doubt, keep a screenshot of the earlier records and delete all of them. The new records should do the work. Anything to do with DNS servers takes some time to take effect. So after completing this step, go away to do something else for three or four hours ( though Google claims that it will take 24 hours) and then see if you can load  your website at www.xyz.com. If everything is OK, you should see your blog.

June 25, 2016

Spark, Python & Data Science -- Tutorial

Hadoop is history and Spark is the new kid on the block who is the darling of the Big Data community. Hadoop was unique. It was a pioneer that showed how "easy" it was to replace large, expensive server hardware with a collection, or cluster, of cheap, low end machines and crunch through gigabytes of data using a new programming style called Map-Reduce that I have explained elsewhere. But "easy" is a relative term. Installing Hadoop or writing the Java code for even simple Map-Reduce tasks was not for the faint hearted. So we had Hive and Pig to simplify matters. Then came tools like H20 and distributions like Hortonworks to make life even simpler for non-Geeks who wanted to focus purely on the data science piece without having to bother about technology. But as I said, with the arrival of Spark, all that is now history!

Spark was developed at the University of California at Berkeley and appeared on the horizon for data scientists in 2013 at an O'Reilly conference. You can see the presentations made there, but the following one will give you a quick overview of what this technology is all about.

But the three real reasons why Spark has become my current heart-throb is because
  1. It is ridiculously simple to install. Compared to the weeks that it took me to understand, figure out and install Hadoop, this was over in a few minutes. Download a zip, unzip, define some paths and you are up and running
  2. Spark is super smart with memory management and so unlike Hadoop, starting Spark on your humble laptop will not kill it. You can keep working with other applications even when Spark is running -- unless of course you are actually crunching through 5 million rows of data. Which is what I actually did, on my laptop.
  3. And this is the killer. Coding is some simple. 50 lines of code Java code -- all that public static void main()  crap -- needed in Hadoop, reduces to two or three lines of Scala or Python code. Serious, not joking.
And unlike the Mahout machine learning library of Hadoop that everyone talked about but no one could really use, the Spark machine learning library, though based on Mahout code, is something that  you can be running at the end of this tutorial itself. So enough of chit-chat, let us get down to action and see how easy it is to get going with Spark and Python.

In this post, we will show how to install and configure Spark, run the famous WordCount program so beloved of the Hadoop community, run a few machine learning programs and finally work our way through a complete data science exercise involving descriptive statistics, logistic regression, decision trees and even SQL -- the whole works.

Though, in principle, Spark should work on Windows, the reality it is not worth the trouble. Don't even try it. Spark is based on Hadoop and Hadoop is never very comfortable with Windows. If you have access to a Linux machine either as full machine to yourself or one that has a dual boot with Windows and Linux then you may skip section [A] on creating virtual machines and go directly to  section [B] on installing Spark.

Also please understand that you need a basic familiarity with the Linux platform. If you have no clues at all about what is "sudo apt-get ..."  or have never used the "vi" or equivalent text editor then it may be a good idea to have someone with you who knows these things during the install phase. Please do understand that this is not like downloading an .exe file in Windows and double-clicking on it to install a software. But even if you have a rudimentary understanding of Linux and can follow instructions, you should be up and running.

A] Creating a Virtual Machine running Ubuntu on Windows

If your machine has only Windows -- as is the case with most Windows 8 and even Windows 10 users -- then you will have to create an Linux Virtual Machine and carry out the rest of the exercise on the VM.   This exercise was comfortably carried out on 8GB RAM laptop but even 6GB should suffice.

  1. Download Oracle VirtualBox [ including Extension pack ] software for Windows and install it on your Windows machine.
  2. Download an Ubuntu image for the Virtual Box. Make sure that you get the image for the VirtualBox and not the VMware version! This is a big download, nearly 1GB and may take some time. What you get is a zip file that you can unzip to obtain a .vdi file, a virtual disk image. Note the userid, password of the admin user that will be present in the VM [ usually userid is osboxes and password is osboxes.org, but this may be different ]
  3. Start the VirtualBox software and create new virtual machine using the vdi file that  you have just downloaded and unzipped. You can give the machine any name but it must be defined as a Linux, Ubuntu. 
    1. If you are not sure how to create a virtual machine, follow these instructions. Remember to allocate at least 6GB RAM to the virtual machine
    2. If your machine is 64 bit but VirtualBox is only showing 32 bit options then it means that virtualization has been disabled on your machine. Do not panic, simply follow instructions given here. If you dont know how to boot your machine into bios then see boot-keys.org
    3. Once your Ubuntu virtual machine starts, you will find that it runs in a small window and quite inconvenient to use. To make the VM occupy the full screen you would need to install Guest Additions to Virtual Box by following instructions given here  [ sudo apt install virtualbox-guest-additions-iso ] followed by loading the CD image as explained here
    4. In the setup options of the VM you can define shared folders between the Windows host OS and the Ubuntu guest OS. However the shared folder will be visible but not accessible to the Ubuntu userid until you do the this
    5. Steps 3 and 4 are not really necessary for Spark but if you skip them you may find it difficult or uncomfortable to work inside a very cramped window
  4. Strangely enough, the VM image does not come with Java, that is essential for Spark. So please install Java by following these instructions.
Ubuntu is so cool! Who wants Windows?

B] Install Spark

Once we have an Ubuntu machine, whether real or virtual, we can now focus on getting Python and Spark.
  1. Python - The Ubuntu 16.04 virtual machine comes with Python 2.7 already installed and is adequate if you want to use Spark at the command line. However if you want to use iPython notebooks [ and our subsequent tutorial needs notebooks ] it is better to install the same.
    1. There are many ways to install iPython notebooks but the easiest way would be to download and install Anaconda
      1. Note that this needs to be downloaded inside the Ubuntu guest OS and not the Windows host OS if  you are using a VM.
      2. When the install scripts asks if Anaconda should be placed in the system path, please say YES
    2. Start python and ipython from the Ubuntu prompt and you should see that Anaconda's version of python is being loaded.
  2. Spark - the instructions given here have been derived from this page but there are some significant deviations to accommodate the current version of the ipython notebook.
    1. Download the latest version of Spark from here.
      1. In the package type DO NOT CHOOSE source code as otherwise you will have to compile it. Choose instead the package with the latest pre-built Hadoop. 
      2. Choose direct download, not a mirror.
    2. Unzip the tgz file, move the resultant directory to a convenient location and give it a simple name. In our case it was /home/osboxes/spark16
    3. Add the following lines to the end of file .profile
      1. export SPARK_HOME=/home/osboxes/spark16
      2. export PATH=$SPARK_HOME/bin:$PATH
      3. export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
      4. export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH
        1. to get the correct version of the py4j-n.n-src.zip file go to $SPARK_HOME/python/lib and see the actual value
        2. the last two paths are required because in many cases the py4j library is not found
    4. To start spark in the command line mode enter the "pyspark" command and you should see the familiar Spark screen. To quit enter exit()
    5. To start spark in the ipython notebook format enter the command $IPYTHON_OPTS="notebook" pyspark. Please note that the strategy of using profiles for starting ipython notebook may not work as the current version of jupyter does not support profiles anymore and hence this strategy was used. This will start the server and make it available at port 8888 on the localhost. To quit press ctrl-c  twice in quick succession.

    6. An alternative way of starting the notebook, not involving the IPYTHON_OPTS command is shown here. This is easier
      1. Start notebook with $ipython notebook ( or alternatively, $jupyter notebook)
      2. Execute these two lines from the first cell of the notebook
        1. from pyspark import  SparkContext
        2. sc = SparkContext( 'local', 'pyspark')
  3. Now we have Spark running on our Ubuntu machine, check out the status at http;//localhost:4040

C] Running Simple programs

If you have not familiar with Python do go through some of the first few exercises of Learning Python the Hard Way and if the concept of a notebook is alien to you then go through this tutorial.

Go to this page and scroll down to the section "Interacting with Spark" and follow the instructions there to run the WordCount application. This will need a txt file as input and any text file will do. If you cannot find a file, create one with vi or gedit and write a few sentences there and use it. Enter each of these lines as a command at the pyspark prompt

text = sc.textFile("datafile.txt")
print text
from operator import add
def tokenize(text):
    return text.split()
words = text.flatMap(tokenize)
print words
wc = words.map(lambda x: (x,1))
print wc.toDebugString()
counts = wc.reduceByKey(add)

The final output in Hadoop style will be stored in a directory called "output-dir". Remember Hadoop, and hence Spark, does not allow the same output directory to be reused.

The same commands can also be entered one by one in the ipython notebook and you would get the same result

This establishes that you have Spark and Python working smoothly on your machine. Now for some real data science

D] Data Science with Spark

[New 24Jul16] Unlike Hadoop / Mahout, the machine learning library of Spark is quite easy to use. There are tons and tons of samples and even machine learning samples available. These samples along with the sample data are also available in the Spark Home directory that gets created during the installation of Spark as described above. You an run these programs using the spark-submit command as explained in this page after making small changes to bring them into the format described on that page. The basic template for converting these samples to run with spark-submit and two sample programs for clustering and logistic regression is available for download here.

To understand the nuances of the MLLIB library read the documentation, then, for example, follow the one on k-means. For more details of the API and the k-means models follow the links.

Jose A Dianes, a mentor at codementor, has created a very comprehensive tutorial on data science and his ipython notebooks are available for download at github. This uses actual data from a KDD cup competition and will lead the user through

  • Basics of RDD datasets
  • Exploratory Data Analysis with Descriptive Statistics
  • Logistic Regression
  • Classification with Decision Trees
  • Usage of SQL
After going through this tutorial, one will have a good idea of how Spark and Python can be used to address a full cycle data science problem right from data gathering to building models

Spark is a part of the curriculum in the Business Analytics program at Praxis Business School, Calcutta. At the request of our students I have created an Oracle Virtual Appliance that you can download [ 4GB though] import it into your Virtual Box and go directly to section [D]! No need for any installation and configuration of Ubuntu, Java, Anaconda, Spark or even creating the demo MLLIB code. This VM has been configured with 4GB RAM which just about suffices. Increase this to 6GB if feasible. -- Updates : [28Aug16] - New Virtual Box (password ="osboxes.org")

I was invited to Cypher2016 where I delivered a lecture and demonstration on Python, Spark, Machine Learning and running this on AWS

About This Blog

  © Blogger template 'External' by Ourblogtemplates.com 2008

Back to TOP