A single neuron
cannot form memory,
but when countless neurons gather, it becomes possible.
The same is true of bricks.
The science of one brick
and the architecture of a building
made from many bricks
are different problems.
Ah.
I happened to find this passage in an Aladin bookstore.
How could I possibly leave this book behind and come home alone?
It is a book filled with a physicist's worries and recollections. Perhaps one reason it carries such a strong humanities scent (p.101) is that it quietly leaves out the complex labor and struggle that were necessary to arrive at the result, and instead shows only the result the author eventually experienced.
It has been a long time since I felt excited by sentence after sentence in a book like this.
I am still near the beginning, but after scribbling notes everywhere, I decided to save a few lines here before I forgot them.
The real world is disorderly, and as mentioned at the beginning, many phenomena in reality can be explained through the interaction of countless components. The interactions among those components can be expressed as simple rules, yet the collective behavior that emerges from them is extremely hard to predict.
The basic actors may be spins, atoms, molecules, neurons, and ordinary cells, but they may also include websites, stockbrokers, stocks and bonds, people, animals, and elements of ecosystems.
Not every interaction among basic actors creates a disordered system. Disorder comes from the fact that some basic actors behave differently from others. There are spins that try to align in reverse, atoms that move differently from most atoms, financial investors who sell stocks that others are buying, and dinner guests who, though invited, want to sit far away because they dislike another guest. p.103
Most of the artificial intelligence commonly used in internet applications is based on spin-glass theory and artificial neural network theory. p.77
If AI takes its motif from neural networks, then perhaps neural networks themselves are one scene within spin-glass theory.
Phase transition is such a familiar form of "everyday physics" that we often fail to notice it. Yet for physicists it is a deeply fascinating phenomenon. Today everyone knows that when water reaches 100 degrees Celsius it starts boiling and changes from liquid to gas, and that when it drops below 0 degrees Celsius it changes from liquid to solid, that is, into ice. But why do such changes happen?
The physical quantities we measure macroscopically, like the temperature of water, are governed by the behavior of microscopic actors. Molecular speed is one example, though we cannot observe molecular motion directly.
To study phase transitions at the microscopic level, we have to understand the behavior of countless "objects" such as atoms, molecules, or tiny magnets. These "basic elements," which interact, exchange information, and revise their behavior based on the information they receive, can, in a more general context than traditional physics, be called agents. In physics, exchanging information is equivalent to exchanging forces, and in general objects behave differently depending on whether other objects are far away or close to them. p.90
That paper on disordered systems and spin glasses was far removed from my own research field at the time, and it was a subject I had never dealt with before. ... I studied the model and redid all the calculations. The calculations were right, but the results were not. p.80
The replica method that solved a problem in particle physics.
A system's phase transition is generally characterized by a change in an order parameter. For example, the order parameter used to study the phase transition between liquid and gas is density. In a magnetic phase transition, the order parameter becomes magnetization. These order parameters, such as density and magnetization, express physical meaning in values we can understand intuitively, and their values change when phase transitions occur.
What surprised me was that in the result I obtained from spin-glass calculations, the order parameter was no longer a simple number that changes during the phase transition. What changed during the transition was a function. A single point was not enough to describe the phase transition; instead of one number, one had to use a function made up of infinitely many numbers. p.92
The property that semiconductors can switch between 1 and 0 depending on conditions seems similar to the spin and scalar ideas in the Ising model. And in the sense that it is not merely a value that changes but a function, it also feels somehow analogous to the anti/ferromagnetism, quantum-like forms, or multidimensional matrices that appear in spin-glass models and other disordered systems with irregular directions of their own. Of course they are not the same, but certain parts feel abstractly resonant.
Then what does this function physically represent? Using a function instead of a number as the order parameter for phase transitions was the decisive step for adopting the replica method. When the order parameter was a single number, the replica method produced absurd results. But when the order parameter was a function, that is, a set of infinitely many numbers, just as a line can be thought of as a gathering of infinitely many points, the replica method produced consistent results. There must be some physically profound meaning related to the need for an infinite order parameter, a function, in order to explain the system's phase transition. p.92
The triple point of water is famous for a reason. In general, a system occupies one phase. By contrast, a low-temperature disordered system can occupy many very different phases at the same time. That is what it means for the order parameter to become a function, that is, a set of infinitely many values. Recognizing this was a genuine step forward in physics. Thanks to the construction of the model and its interpretation, we discovered phenomena we had not even known existed. We opened the door wide to the world of disordered systems. p.100
Seen this way, convolutional neural networks, recurrent neural networks, and transformers all feel personally analogous to these keywords: order parameters, functions, infinite sets of numbers, and the replica method. The resemblance is entirely my own reading, but the association is difficult to ignore.
Phase transitions occur through the interaction of many components with clear spatial positions. Simplified models discussed earlier did not take this into account. They omitted not only spatial structure but also change over time. The tools of statistical mechanics are easy to use when a system is in equilibrium, when its state remains stable even as time passes. But in disordered systems such as glass or wax, the time needed to reach equilibrium is often extremely long, years or even centuries. That is true even of the glass in my own windows, whose strength was improved by industrial techniques.
When a physical process is not in equilibrium, we can always distinguish before from after, so there is a sense of time. But in a system at equilibrium, this is not so. Put simply, if a ball is resting at the bottom of a valley in a stable equilibrium, photographs of that scene contain no sign of change, so we would not know how to place them in chronological order. But if we photograph the ball rolling downward, the situation changes, because in a non-equilibrium state the temporal order is clear.
Therefore the theory must be extended to describe non-equilibrium states that permit temporal change. It must also be extended to take spatial structure into account so that interactions exist only between neighboring components. This means there is still a great deal of work left before we can fully understand the phase transition of glass.
A single neuron cannot form memory, but when many neurons gather, it becomes possible.
The same is true of bricks. The science of one brick and the architecture of a building made from many bricks are different problems.
The same may be true of relationships among people, but it also makes me think about the difference between today's single LLM models and MoE architectures. No matter how complete a single model becomes, perhaps it still resembles an abstracted statistical-mechanics technique that does not fully account for an infinitely changing disordered system. From that perspective, MoE, a single LLM, and a set of LLMs may need to be designed with fundamentally different structures and properties.
Between Disorder and Order, Giorgio Parisi - Science Books
Between Disorder and Order
The first Korean edition of Giorgio Parisi's first popular science book.
www.aladin.co.kr
