I’m getting quite excited by the ability to see things in data at scale, that we might never have previously been able to. At times, “big data” sounds like one of those meaningless phrases that get thrown around or tacked on to something else to justify budgets (see also “cyber“). However, if you will bear with me for a moment or two, I would like to share with you both a TEDx Talk I saw recently, and my thoughts about it. So first, the talk:
Jeremy Howard talks about some of the implications of Machine Learning. Specifically, he talks about how with enough data to learn from, computers are getting almost as good as us at many tasks, and pretty soon they’ll probably be better than us. Significantly better than us.
Machine Learning is here, and it is advancing rapidly, becoming more efficient, and it is doing so exponentially. It can do in minutes what less than a generation ago could have taken over 30 man-years to do manually. It threatens the very foundation of our current economy. But then again, the candle manufacturers also thought that about light bulbs, and we survived.
However what got me thinking was the potential scale of the positives – it’s good at spotting cancer, it makes more accurate weather predictions, and it may be very useful for law enforcement:
Take the example of an image depicting child sexual exploitation: The background may tell you something about where the image was taken. The perpetrator may have tried to sanitise the image, and they almost certainly will have removed the EXIF data. However, some things will still be there… the light levels in the room, the colour temperature, the dimensions, distinct markings, carpet type and colour etc. These can all be considered unique data points, markers that a computer would understand and see. Now imagine that the previous owner of that house had uploaded a video to YouTube from within that room, or once had a party in the room and uploaded pictures to Facebook. There are (at the time of writing) approximately 300 hours of new content uploaded to YouTube every minute, meaning that even with a workforce of tens of thousands, it is entirely impossible for humans to understand the full volume of the content. That, however, isn’t true of Machine Learning. To that system, recognising a location-based on a previous video or photo it had seen would be like a human seeing a picture of somewhere they had already been. A powerful Machine Learning system could identify the location, or at least provide an excellent investigative lead to Law Enforcement. (Indeed, a less granular version of this type of Machine Learning application has already been demonstrated by Google).
On the flip side however, it has a massive potential for the erosion of privacy. A system that can learn to recognise people, places, can become a very powerful tool for oppressive surveillance in the wrong hands. I am, I must admit, both concerned and excited about the future path of these technologies. Concerned, because I have noted with dismay that the law moves slower than technology and thus may take its time to regulate these technologies. Excited, because through these technologies we have the potential to learn new things and accelerate the rate of our technological development. I just hope I’m worrying in the same way that a knocker-up may have done about the introduction of cheap and reliable alarm clocks.