Wouldn’t it be wonderful if computers could figure out how to solve problems on their own, without precise instructions? Neural networks hold this promise, but scientists must use them with caution – or risk discovering that they have solved the wrong problem entirely, writes Janelle Shane
Artificial neural networks are a form of machine-learning algorithm with a structure roughly based on that of the human brain. Like other kinds of machine-learning algorithms, they can solve problems through trial and error without being explicitly programmed with rules to follow. They’re often called “artificial intelligence” (AI), and although they are are much less advanced than science-fiction AIs, they can control self-driving cars, deliver ads, recognize faces, translate texts and even help artists design new paintings – or create bizarre new paint colours with names like “sudden pine” and “sting grey”.
Neural networks were first developed in the 1950s to test theories about the way that interconnected neurons in the human brain store information and react to input data. As in the brain, the output of an artificial neural network depends on the strength of the connections between its virtual neurons – except in this case, the “neurons” are not actual cells, but connected modules of a computer program. When the virtual neurons are connected in several layers, this is known as deep learning.
A learning process tunes these connection strengths via trial and error, attempting to maximize the neural network’s performance at solving some problem. The goal might be to match input data and make predictions about new data the network hasn’t seen before (supervised learning), or maximizing a “reward” function to discover new solutions to a problem (reinforcement learning). The architecture of a neural network, including the number and arrangement of its neurons, or the division of labour between specialized sub-modules, is usually tailored to each problem.
The growing availability of cheap cloud computing and graphics processing units (GPUs) are key factors behind the rise of neural networks, making them both more powerful and more accessible. The availability of large amounts of new training data, such as databases of labelled medical images, satellite images or customer browsing histories, has also helped boost the power of neural networks. In addition, the proliferation of new open-source tools such as Tensorflow, Keras and Torch has helped make neural networks accessible to programmers and non-programmers from a variety of fields. Finally, success begets success: as the value of neural networks in commercial applications becomes more apparent, developers have sought new ways of exploiting their capabilities – including using them to aid scientific research.
They’re great at matching patterns and finding subtle trends in highly multivariate data. Crucially, they make progress towards their goal even if the programmer doesn’t know how to solve the problem ahead of time. This is useful for problems with solutions that are complex or poorly understood. In image recognition, for example, the programmer may not be able to write down all the rules for determining whether a given image contains a cat, but given enough examples, a neural network can determine for itself what the important features are. Similarly, a neural network can learn to identify the signature of a planetary transit without being told which features are important. All it needs is a set of sample starlight curves that correspond to planetary transits, and another set of light curves that do not. This makes neural networks an unusually flexible tool, and the fact that neural network frameworks come in “flavours” specialized for tasks such as classifying data, making predictions, and designing devices and systems only adds to their flexibility.
Neural networks are also particularly well suited for projects that generate too much data to be easily sorted or stored, especially if the occasional mistake can be tolerated. Often, they’re used to flag events of interest for human review. In a 2017 study of exoplanet candidates, for example, software engineer Christopher Shallue of Google Brain and astronomer Andrew Vanderburg of the University of Texas at Austin used neural networks to search lists of candidate light curves for those most likely to correspond to true planetary transits. The results enabled them to reduce the number of candidates by more than an order of magnitude. In another astronomy application, a team from the Observatoire de Sauverny in Switzerland used a neural network to examine huge datasets of galaxy images, looking for those that might contain gravitational lenses. Other groups have used neural network classifiers to identify rare, interesting, collision events in data from the Large Hadron Collider at CERN.
Another kind of neural network can generate predictions based on input data. Networks of this type have, for example, been used to predict the absorption spectrum of a nanoparticle based on its structure, after being given examples of other nanoparticles and their absorption spectra. Such networks are being used in chemistry and drug discovery as well, for example to predict the binding affinities of proteins and ligands based on their structures.
In combination with a technique called reinforcement learning, neural networks can also be used to solve design problems. In reinforcement learning, rather than trying to imitate a list of examples, a neural network tries to maximize the value of a reward function. For example, a neural network controlling the limbs of a robot might adjust its own connections in a way that, through trial and error, ends up maximizing the robot’s horizontal speed. Another algorithm might control the spectral phase of an ultrashort laser pulse, trying to maximize the ratio of two fragmentation products generated when the laser pulse hits a certain molecule.
Because neural network algorithms solve problems in whatever ways they can manage, they sometimes arrive at solutions that aren’t particularly useful – and it can take an expert to detect how and where they have gone wrong. Hence, they are not a substitute for a good understanding of the problem. Below are a few possible pitfalls.
In general, neural networks (and other machine-learning algorithms) don’t explain how they arrived at their solutions. This can make it harder to understand whether these solutions are exploiting new physics, or are based on a bug or some simple effect that has been overlooked. Machine-learning research is full of anecdotes of algorithms arriving at seemingly perfect solutions that turn out to stem from problems with the algorithm itself. For example, in 2013 researchers at MIT Lincoln Labs tested a computer program that was supposed to learn to sort a list of numbers. It achieved a perfect score, but then the programmers discovered that it had done so by deleting the list. (According to the algorithm’s reward function, a deleted list yielded a perfect score because, technically, the list was no longer unsorted.) In another example, a machine-learning algorithm was used to shape laser pulses to selectively fragment molecules. Although the resulting laser pulses were very complex, in many cases the dominant effect turned out to be the overall change in laser pulse intensity rather than the pulse’s complex structure.
To combat this problem, researchers are working on algorithmic interpretability, developing techniques for discovering how algorithms make their decisions. For example, some image-recognition algorithms can now report which pixels were important in making their decisions, and individual layers of neurons can report which kinds of features (like a dog’s floppy ear) they have learned to find.
Solving the wrong problem
Users of neural networks also have to make sure their algorithm has actually solved the correct problem. Otherwise, undetected biases in the input datasets may produce unintended results. For example, Roberto Novoa, a clinical dermatologist at Stanford University in the US, has described a time when he and his colleagues designed an algorithm to recognize skin cancer – only to discover that they’d accidentally designed a ruler detector instead, because the largest tumours had been photographed with rulers next to them for scale. Another group, this time at the University of Washington, demonstrated a deliberately bad algorithm that was, in theory, supposed to classify husky dogs and wolves, but actually functioned as a snow detector: they’d trained their algorithm with a dataset in which most of the wolf pictures had snowy backgrounds.
Careful review of an algorithm’s results by human experts can help detect and correct these problems. For example, the abovementioned study on star transits flagged suspected exoplanets for human review, rather than simply generating a “We found a new planet!” press release. This was fortunate because most of the “exoplanets” turned out to be artefacts that the algorithm had not learned to detect.
Class imbalances and overfitting
When researchers try to train data-classifying machine-learning algorithms, they often run into a problem called class imbalance. This means that they have many more training examples of one data category than others, which is often the case for studies that are searching for rare events. The result of class imbalance can be an algorithm that doesn’t have enough data to make progress, yet “thinks” it is doing splendidly. To cite one recently reported example from the solar storm team at NASA’s Frontier Development Lab, if solar flares are very rare in the training dataset, the algorithm can achieve near-perfect accuracy by predicting zero solar flares. This is also a problem for planetary transit studies because true planetary transits are relatively rare.
To address class imbalance, the rule of thumb is to include roughly equal numbers of training examples in each category. Data-augmentation techniques can help with this. However, using data augmentation, or simulated data, can lead to another problem: overfitting. This is one of the most persistent problems with neural networks. In short, the algorithm learns to match its training data very well, but isn’t able to generalize to new data. One likely example is the Google Flu algorithm, which made headlines in the early 2010s for its ability to anticipate flu outbreaks by tracking how often people searched for information on flu symptoms. However, as new data started to accumulate, Google Flu turned out to be much less accurate, and its reported success is now thought to be due to overfitting. In another example, an algorithm was supposed to evolve a circuit that could produce an oscillating signal; instead, researchers at the University of Sussex and Hewlett-Packard Labs in Bristol, UK, found that it evolved a radio that could pick up an oscillating signal from nearby computers. This is a clear example of overfitting because the circuit would only have worked in its original lab environment.
The way to detect overfitting is to test the model against data and situations it hasn’t seen. This is especially important if the model was trained on simulated data (like simulated images of gravitational lenses, or simulated physics), to make sure the model hasn’t learned to use artefacts of the simulation.
Neural networks can be a very useful tool, but users must be careful not to trust them blindly. Their impressive abilities are a complement to, rather than a substitute for, critical thinking and human expertise.