Renewal·마흔의 생활코딩

Prompt Engineering?

normalstory

April 8, 2024·3 min read

cover image

YouTube

external media

Video summaries

No summary yet.

Focusing only on how to use ChatGPT well can be very dangerous, not only individually but also as part of an organization. It only increases dependency across the whole thought process - problem definition, problem solving, and even problem recognition.

The keyword ChatGPT has already become muddied with auto-monetization pitches and marketing jargon. I have written this before, but even prompt engineering is something ChatGPT does better than humans.

ChatGPT has already turned into a generic name, like Walkman. And lately, more and more cheaper and better LLM models are emerging. The problem is that the prompting standards that raise their performance differ for each one.

So, to use them as tools without being dominated by them, we should build a selective relationship rather than a dependent one. For that, instead of studying prompt-engineering techniques for a specific brand, what matters most is building an environment in which you can design and build your own AI agents directly. https://www.youtube.com/watch?v=fsIipBuM4Nc

Tools for this include CrewAI, DEVIKA, and AutoGen Studio. The further right you go, the closer you get to composing agents with no code.

What are Agents?

Entities with the capacity to act. In fact, here to Act refers to what we do unconsciously every day - something we have thought about for a very long time. Through that long reflection, what we call rational thought began.

Going far back, Aristotle said something like: we do not deliberate about ends, but about means. Doctors do not deliberate whether to heal; orators do not deliberate whether to persuade. They assume the goal and consider the method and means to reach it. In the end, action follows what the mind wants. So, between where I am and where I want to be, we act by moving through many nodes - small tasks along the path of action.

In the 13th century, Ramon Llull - who has been called the father of computer science and theory - argued through his Logical System to Discover the Truth that humanity has something like a logical calculus. In the 19th century, Ada Lovelace, who wrote Note G (often called the first algorithm), said thought might be something like a reasoning machine. These were thoughts held by people living more than two thousand years apart.

Starting with Alan Turing in 1950 and many others, symbolic AI emerged, but ran into the endless edge cases of the real world and led to a long AI winter.

Todays AI produces something like thought by predicting tokens without dealing with the internal details. After that came deep learning by stacking various layers, and reinforcement learning - training agents that follow certain rules in specific environments through reward and punishment. In 2016 this shifted toward more open environments via OpenAI.

The criterion of AI being intelligent focused before 1980 on achieving (our) goals, and after that shifted to being able to achieve goals while factoring in what the agent itself recognizes about various specific environments. Today, the bar for an agent is not merely to perform a specific function but to act ideally and autonomously alongside humans. Main capabilities include autonomous action, memory, social ability, reactivity, and proactivity.

What do Agents do? (and what can humans not do)

Computer scientist Stuart Russell says in his book Human Compatible: The fundamental reason why machines - by gaining the ability to understand speech and text - can do things humans cannot is not the depth of understanding but its scale.

We cannot compete with agents, but once we know what this is we can do many more things through them.

What do Agents do? (and what can a single Agent not do)

1. Iteration -> Less Hallucinations -> Improved Accuracy
GPT-3.5 (zero-shot) scored 48.1%. GPT-4 (zero-shot) did better at 67.0%. But when you integrate an iterative agent workflow, the performance gap between GPT-3.5 and GPT-4 shrinks significantly. In fact, wrapped in an agent loop, GPT-3.5 reaches up to 95.1% accuracy. - Andrew Ng

2. Offload decision making and prioritization
Distributing decision-making and prioritization workloads that a single agent cannot handle.

This English version was translated by Claude.

Written by

친절한 찰쓰씨

Pleasant Charles — UI/UX researcher at AIT. Keeping notes on design, planning, and slow days here since 2010.

Keep reading

Renewal

Prompt Engineering?

Video summaries

Keep reading

Steadily, for the long haul, without burning out

Tech-life balance

Humanality, by Park Jeong-ryeol