Renewal·마흔의 생활코딩

GPT-4o API Review 1 | Universal Access to Multimodality

May 17, 2024·3 min read

cover image

Note! This is an extremely biased and subjective personal record, not of the technical or feature-level aspects of LLMs, but of a shift in the range of human cognition over time. Personally, I do not think I had ever left a review of a technology while doing Coding Everybody-style exercises. But while working through the GPT-4o API practice, especially the problem of finding the area of a shape drawn with lines, I was hit by no small shock. So this time I am leaving behind some personal reflections from that process, reflections that may feel embarrassing when I read them later.

A review of OpenAI GPT-4o API practice
? 1. First impression, universal access to multimodality
2. The background of that feeling, I think therefore I am (feat. hallucination)

In the history of philosophy, traditional philosophy, especially thinkers like Descartes and Hume, regarded images as a kind of illusion, something that blocked true knowledge of reality. Some time later, Deleuze appeared and explained that images were not simple imitations of something else but "blocks that compose reality," and at a roughly similar moment Sartre defined the image not as an incomplete falsehood but as a fundamental way through which we look at the world.
In my very short view, perhaps this marks the point of transition from understanding pure "material substance" to recognizing the external world, or the other, from the standpoint of the actually existing subject.

In fact, even before the philosophical disputes over images, when printed text first appeared, the intellectuals of the time also voiced fierce concern about the phenomenon of specific knowledge being infinitely reproduced and distributed to an unspecified mass of people.
In the same way, the caution directed at printed text, then images, and more recently video content, sometimes knowledge and even indirect experience, makes me think of Michel Foucault's point: perhaps the privilege of pure thought, or what human beings define as common sense, keeps changing with each era.

The image below deals with the concerns of philosophers who were thinking at the moment when humanity had only just begun to recognize image-based and bodily visual cognition, along with the various objects that made use of it, and when that knowledge was just starting to circulate. Now we have gone beyond text and images, beyond screens and still pictures, into the age of moving images. It feels like another indirect way in which human beings can look at one another as if from the outside.

It may be confirmation bias, and it may be a severely generalized one, but from the standpoint of someone who used feature phones, then PDAs, and then experienced smartphones, the full-scale beginning of universal access to personal multimodality, thanks in part to GPT-4o, felt like an extremely important turning point.
This seems to me like another major phase, comparable to the expansion of human perception and cognition that came with the move from oral culture to print, from print to images, and from images to video, along with the polarization of that diversity. What kind of new phase will emerge as people encounter the technology now called AI, sometimes actively and sometimes passively? It feels like an important moment.

All this time, humans have made tools, especially smartphones, along with the institutions, cultures, and knowledge that differ around them, and those tools have shaped humans in return. So this time, what kind of change will the AI that humans made, what we still think of as a tool, bring about in humans?

Just as memories and emotions come back when I think of feature phones and MP3 players, I hope the changes of today can also be remembered one day as a single memory. Of course, I am sure the people in that field were covered in blood and tears at the time. And of course, this era will probably be remembered as one of the busiest periods not only in IT but also in sociology and philosophy.

That is all.

I am simply leaving behind the feelings that one ordinary person felt, and a brief reflection on this era.

This English version was translated by Codex.

Written by

친절한 찰쓰씨

Pleasant Charles — UI/UX researcher at AIT. Keeping notes on design, planning, and slow days here since 2010.

Keep reading

Renewal

GPT-4o API Review 1 | Universal Access to Multimodality

Keep reading

Steadily, for the long haul, without burning out

Tech-life balance

Humanality, by Park Jeong-ryeol