Posts

Distance Learning - Reloaded

Image
Distance learning using the internet -- that allows students anywhere in the country to learn from the best teachers --  is old news! We have heard of Massively Open Online Courseware (MOOCs) popularised by Coursera and Udemy, applauded the good work done by the Khan Academy and proudly talk about the gigabytes of, rather boring, videos loaded into Youtube by IIT professors under the NPTEL program. But none of all this has had any impact on the critical skill gap that separates students who pass out of India’s colleges from the jobs that await them in a booming economy. Institutions like IITs, IIMs,  NITs, Presidency and others, may be doing well, but there are another 600 degree granting institutions with more than 35,000 affiliated colleges that have lost the plot completely. Most of these have inadequate infrastructure, teachers who are barely competent or rarely in the classroom, outdated syllabi and an academic atmosphere vitiated by student politics and yet it ...

From Hadoop Streaming to RHadoop

Image
The challenge of combining the statistical power of R and the "Big Data" capabilities of Hadoop is something that has always fascinated me. Over a year ago, I had finally broken free from from the stupidity of the WordCount ( and various other counting ) programs and tried to solve a real like retail problem with linear regression using R and Hadoop. This is documented in my blog post Forecasting Retail Sales -- Linear Regression with R and Hadoop . In this case however I had used the Hadoop streaming API to call to separate R programs. Subsequently I had come across the Hortonworks HDP platform that dramatically simplified the process of installing and running Hadoop. This is explained in my blog post Big Data for the Non Geek, where in addition to installing Hadoop, I have also explained how to overcome the challenges of installing the RHadoop packages on top Hadoop on the Hortonworks platform. Hortonworks has a nice example of how to run an rHadoop program on the HDP...

Teachers at St. Xavier's Calcutta - circa 1970

Image
Before these names fade out of memory ... Front row : Mr Subramanium, Mr Tony D'Abrew, Fr Bouche, Fr Wavreil, Fr Desbrulais, Mr Verma, Mr Dey (Goba) Second row :Mr Nelson, X, Mr Sajal Banerjee, Mr Kamalendu Chaudhury[?], Mr Engineer, Mr Abraham[?], Mr Leslie Davey, Mr Mishra, Mr A K Samajpati, Mr A P Sarkar, Mr Carlyle Rosario, Mr Sushil Sarkar[?], Fr Maliyekal[?], Mr Nemai Sengupta[?], Mr Lobo, Mr Les D'Gama, X,X, Dr Magno Correa Back Row : Mr Redden, Mr T Vianna, Mr Gomes[?] Mr Rai[?], Mr Ganga Singh, Mr Balai Banerjee[?], Mr Chittaranjan Roy, Mr Tripathi, Fr L Hous , Mr Pinto, Mr Brown, Mr Gass -------------------------------------------------------------------------------------------------- original picture with tags available at Les D'Gama's facebook timeline

Technology, Management & Systems : The Holy Trinity ?

India has a strange fascination for engineers and MBAs. Everybody wants to become one, or preferably both. So is the case with systems, or as they say in India, the IT sector. But this fascination is not because of any natural aptitude for these disciplines but simply because they help one to get a job in an otherwise dismal economy. This is unfortunate, because if we step back for a moment and think through  issues that haunt this country, it would seem that our salvation lies in leveraging this holy trinity to dig us out of the hole that we find ourselves in! Let us consider a few exemplary scenarios. Till the 1990s, telephones in India were a disgrace. While landline technology was readily available and widely used all across the developed world, we were still at the mercy of the corrupt and inefficient P&T department that ensured that very few of us had access to one. This changed dramatically with the arrival of cell-phones that  bypassed the constraints of th...

Maps of India : DIY with R and GADM data

Image
Displaying spatial data on maps is always interesting but most Visualisation tools do not offer facilities to create maps of India, especially at the state and lower levels. In this post, we will show how such maps can be made. The base data for such maps, the "polygons" that define the country, the states, the districts and even the talukas ( or sub-divisions) is available from an organisation called Global Administrative Areas or gadm.org . Country level files for almost all countries are available in a variety of formats including R and these are at three different levels. For India, these files can be downloaded as IND_admN.RData where R = 1,2,3. These will form the raw data from which we will create our maps. Unfortunately, the GADM files represent a truncated Kashmir. How I wish that the Government of India and the National Atlas and Thematic Mapping Organisation  would publish similar files for us. Anyway, we work with what we readily have ... Working with R, w...

Mapping Money Movements to Trap Corruption

Image
In the chequered history of parliamentary legislation in India, the RTI Act stands out as a significant milestone that puts activities of the government under public scrutiny. But even though the Act gives a legitimate platform for citizens to ask questions on, the process is cumbersome and answers are often given in a manner that is not easy to understand or make use of. But why must a citizen have to ask for something that is his by birthright? Why can the information not be released automatically? But then who decides what information is to be released? At what level of detail? At what frequency? The biggest challenge facing India is corruption. It is the mother of all problems because it leads to and exacerbates all other problems. If controlled, the money saved can be used to address most deficiencies in health, education and other social sectors. Misguided people wrongly believe that having a strong Lokpal will solve the problem but when bodies as powerful as the CB...

Big Data for the non-Geek : Hadoop, Hortonworks & H2O

Image
[Note]   -- Hadoop, IMHO, is history. Rather than waste time with all this, suggest you check up my blog post on  Spark with Python . Hadoop is a conceptual delight and an architectural marvel. Anybody who understands the immense challenge of crunching through a humungous amount of data will appreciate the way it transparently distributes the workload across multiple computers and marvel at the elegance with which it does so. image from nextgendistribution.com Thirty years after my first tryst with data -- as relational database management systems that I had come across at the University of Texas at Dallas -- my introduction to Hadoop was an eye opener into a whole new world of data processing. Last summer, I managed to Demystify Map Reduce and Hadoop by installing it on Ubuntu and running a few Java programs but frankly I was more comfortable with Pig and Hive that allowed a non-Java person -- or pre-Java dinosaur -- like me to perform meaningful tasks with Map-R...