Special Edition is the blog for security testing business SE Labs. It explains how we test security products, reports on the internet threats we find and provides security tips for businesses, other organisations and home users.

Monday, 28 November 2016

What is Machine Learning?

What is machine learning, and how do we know it works?

What's the difference between artificial intelligence and machine learning? Put simply, artificial intelligence is the area of study dedicated to making machines solve problems that humans find easy but digital computers find hard, such as driving cars, playing chess or recognising sarcasm. Machine learning is a subset of AI dedicated to developing techniques for making machines learn to solve these and other "human" problems without the insanely complex task of explicitly programming them.

A machine is said to learn if, with increasing experience, it gets better at solving a problem. Let's take identifying malware as an example. This is known as a classification problem. Let's also call into existence a theoretical machine learning program called Mavis. Consistent malware classification is difficult for Mavis because it is deliberately evasive and subtle.

For it to successfully classify malware, we need to show Mavis a huge number of files that are known to be malicious. Once Mavis has digested several million examples, it should be an expert in what makes a file "smell" like malware.

The spectrum of ways in which Mavis might be programmed to learn this task is very wide indeed, and filled with head-spinning concepts and algorithms. Suitable approaches all have advantages and disadvantages. All that counts, however, it's whether Mavis can spot and stop previously unknown malware even when the "smell" is very faint or deliberately disguised to confuse it into an unfortunate misclassification.

A major problem for developers lies in proving that their implementation of Mavis intelligently detects unknown malware. How much training is enough? What happens when their Mavis encounters a completely new threat that smells clean? Do we need a second, signature-based system until we're 100% certain it's getting it right every time? Some vendors prefer a layered approach, while others go all in with their version of Mavis.

Every next generation security product vendor using machine learning says their approach is the best, which is entirely understandable. Like traditional AV products, however, the proof is in the testing. To gain trust in their AI-based products, vendors need to hand them over to independent labs for a thorough, painstaking work out. It's the best way for the public, private enterprises, and governments to be sure that Mavis in her many guises will protect them without faltering.


No comments:

Post a Comment