Chapter 9: Neuroscience Becomes AI Science

How DiCarlo and colleagues showed that an AI program could make a best ever model of the visual cortex

Jan 15, 2025

Note: This chapter is part of the proof-of-concept material for a book, AI: How We Got Here—A Neuroscience Perspective. If you like it, please consider supporting the Kickstarter project.

Around 2008, DiCarlo decided to turn his research program in a new direction. Formerly, he had focused on doing experiment with monkeys, and understanding those experiments with statistics. Now, he would try to explain his experiments with bottom-up models of the visual cortex; models built from smaller models of single components, and most especially, its neurons. His aim with these neural network models was twofold: explain why neurons fired the way they did, and explain how those firings led to behaviors, and especially, recognizing objects.

One of his biggest influences towards this shift was the pronounced success that had recently been made with neural network models by computer scientists. For example, by this time, they were routinely using computer programs, which solved the equations of neural networks, to recognize the account numbers on bank documents. But to understand how a model of a bunch of neurons could be so repurposed, and how this inspired DiCarlo’s own efforts, it is helpful to first take a brief historical tour of neural networks.

When McCulloch and Pitts first created neural network models, in the 1940s, they did so by following a certain kind of tradecraft. This starts with envisioning reality as if it was simpler than it really is. Then, they make a model of this simplified picture of reality with equations that realistically represent it, even if they do not realistically represent real reality .. if I can say that.

In the simplified picture of reality they imagined, a neuron is similar to a simple bucket, connected to other buckets by simple hoses. Each hose has a one-way valve, the strength of which can be adjusted. These characteristics, and remarkably, only these, are what they created equations to represent. More specifically, in their equations, there are only three variables—the value of a signal coming out of a neuron; the connections coming into it, from other neurons, and the value of the incoming signals in those connections; and the strength of their connections. Knowing the latter variables, you can solve for the value of the signal coming out.

As early as 1943, McCulloch and Pitts showed that even these very simple models, when assembled in sufficient combinations, could be very versatile at processing signals. But a more practical demonstration was given in 1958, in the work of Frank Rosenblatt. Rosenblatt showed how a neural network model could be used to attack very simple image processing problems, like deciding whether or not an image contained an image of an airplane in it.

To see how this works, imagine that you have created a neural network, with a certain number of buckets, certain patterns of connections, and certain valve strengths, all pre-chosen. Further, ensure that these buckets are arranged in subsets, such that each subset only sends signals on to one other subset, and none other. This is exactly the arrangement you get if you arranged the buckets in a tower, with each layer only sending signals to the layer below it (or not at all, for the final layer), like a grand tower of champagne glasses. As long as you specify a set of input signals to the topmost layer, you can solve for the output signals of the next layer, and so on. However, the connection strengths are key; they are what allow the buckets to process signals in interesting ways, rather than merely as champagne glasses all overflowing randomly.

Rosenblatt created a neural network model in just this way, implemented on a primitive 1950s-era computer. It started with a first set or layer of 400 neuron models (or buckets), whose input signals were populated by a 20x20 grid of light sensors, so that each pixel of a 400 pixel image was converted into a signal and sent into a bucket. Each of these buckets were then connected to another layer of buckets. Lastly, each of those intermediate buckets was connected to a final layer of just two more buckets.

Rosenblatt showed that if you adjusted the valves in the right ways, you could make this network tend towards filling up just one of the final two buckets, depending on whether the image contained an airplane or not. In other words, the network could do a rough form of recognizing an object. He also provided an algorithm to do the valve adjustment. Each time an input image was categorized right or wrong, it told you how to incrementally adjust each valve strength to make the output a little more likely to be ‘correct,’ for a similar image.

In ensuing years, it was this algorithm that would be considered of greatest benefit. It gave the impression of providing a way to teach the network to do the task you wanted; it is nowadays known as a training algorithm, or machine learning algorithm. To be clear, the algorithm itself is mindless; it is merely an automated means of optimizing the bucket valves. But with it, the Perceptron provided a proof of concept for how objects could be recognized by neural networks.

The initial discovery of these models—and the methods associated with them—provided something remarkable, if not quite what neuroscientists were looking for. Neuroscientists like DiCarlo wanted to explain neural signals. But the main thing that seemed to matter, with these models, was training them to accomplish the task you wanted, using image datasets. Using the models to explain how buckets were filled, or to what extent, seemed like an afterthought.

Over the next five decades, neural networks tended to be increasingly adopted and refined as computational tools, rather than as brain models. By the 2000s, they had gone from conceptual curiosities, like the Perceptron, to become widely deployed in practical applications. They were developed most by practitioners from computer science and artificial intelligence (AI).

As DiCarlo became more acquainted with all this progress, he saw that AI scientists were actually working towards similar goals as neuroscientists. Many of them, working at companies like IBM or Xerox, didn’t know the slightest thing about the visual cortex. They wanted to make automated garage door openers and banal commercial products. But in a way, they were simply being another kind of biological reductionist; focusing on behavior, and making advances.

And so DiCarlo decided to stop worrying, for at least a moment, whether the neural network models—the AI models—that were being created lacked very much in common with the brain, or whether their sub-models of neurons were overly simplistic. Instead, he decided to push in the same direction, to try using their techniques, like increasingly powerful optimization algorithms, to pursue his own better forms of neural networks.

At this stage, around 2007, DiCarlo took a hard right turn into neural network research. He did this in part by recruiting new graduate students and collaborators, who were stronger in quantitative modeling expertise than the average neuroscientist. They included graduate students like Nicolas Pinto, as well as new postdoctoral researchers like Daniel Yamins.

With almost their first forays into the subject, DiCarlo and his new collaborators found that they could create neural networks with state-of-the-art object recognition performance, compared to AI scientists.1 After optimizing this network on a large set of images, they found their network could distinguish between new, previously unseen objects with as much as 67% accuracy. And within just a few more years, in 2014, they had published results showing how to create highly optimized neural networks that could recognize objects about as well as humans.2 This was momentous—a first instance where neuroscientists could create models of the visual cortex that performed as well as the human visual cortex at its chief behaviors.

But even more remarkably, they found that models with better and better performance also got better and better at explaining real neural responses from the IT cortex. For example, if the real neurons in a macaque monkey’s IT cortex all fired twenty times, in response to recognizing a eucalyptus tree, but a hundred times, in response to recognizing a banana, then the responses of the neurons in the model, considered from a next-to-last layer of buckets, would tend to match those patterns of responses; perhaps twenty buckets would be filled all the way, in the former case, but one hundred, in the latter case (to make a crude analogy). In fact, in their best performing models, the correlations between model neuron and real neuron responses were as high as the correlations found from statistical models, like those they had in 2005 discovered.

With this study, DiCarlo, Yamins, and colleagues had just upended one of the major mysteries of neuroscience. They could now explain and predict the strange signals of the IT cortex, and show how those signals performed the most important visual behaviors.

But the discovery in neuroscience tended to be overshadowed. Computer scientists had preceded some of their results, in terms of publication dates. A year and a half earlier, in December 2012, three AI scientists published a paper with their own version of an artificial neural network that performed object recognition at human levels.3 Their neural network was ‘deep’, composed of huge numbers of neuron sub-models, compared to the conventions of the time, and connected in greater numbers of layers. Their discovery ushered in the ‘deep learning revolution,’ whereby engineers could finally get AI programs to compete with humans.

However, it was not just a revolution in AI that happened. To laypersons, who are primarily exposed to the AI industry, it might have seemed that way. But DiCarlo’s results did not transport AI into neuroscience. They showed that the science of AI was the science of neuroscience. Using exactly the same methods, they had created the first compelling models of a brain region. They had done what the simulation aspirants had always set out to accomplish.

Pinto, Nicolas, et al. “Why Is Real-World Visual Object Recognition Hard?” PLOS Computational Biology, vol. 4, no. 1, Jan. 2008, p. e27. PLoS Journals, https://doi.org/10.1371/journal.pcbi.0040027.

Yamins, Daniel L. K., et al. “Performance-Optimized Hierarchical Models Predict Neural Responses in Higher Visual Cortex.” Proceedings of the National Academy of Sciences, vol. 111, no. 23, June 2014, pp. 8619–24. DOI.org (Crossref), https://doi.org/10.1073/pnas.1403112111.

Krizhevsky, Alex, et al. “ImageNet Classification with Deep Convolutional Neural Networks.” Advances in Neural Information Processing Systems, vol. 25, Curran Associates, Inc., 2012. Neural Information Processing Systems, https://proceedings.neurips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html.

AI: How We Got Here—A Neuroscience Perspective

Chapter 9: Neuroscience Becomes AI Science

How DiCarlo and colleagues showed that an AI program could make a best ever model of the visual cortex

Discussion about this post