Back to feed
Renewal·마흔의 생활코딩

Opening the textbook again at forty 01 for deep learning (feat. regression theory)

NS
normalstory
cover image


To learn deep learning and machine learning, I'm carefully going back over the Python I'd rushed through - hacked together - a few months ago for a stock-trading system.
Along the way, I'm having a pretty striking experience.
That experience is: I'm revisiting data structures I first looked at 3 years ago when I was learning Java. And I even opened up the Jeongseok textbook from 30 years ago. Cue the sound effects. (Ah, of course... obviously... sprinkling a little MSG for the rhyme.)



The fact that I still have this book at home is even more surprising... =,.=



Machine learning is based on geometry and vectors. Machine learning encodes geometry and vectors as tensors, while humans/older computing used linear algebra to compute geometry and vectors. These two aren't opposing approaches; rather, linear algebra extended into tensors. Linear algebra is based on vectors and matrices, which in turn are based on arrays, differentiation, and integration.
So... I ended up opening the Jeongseok textbook..
Thankfully ;D, these complicated formulas can be computed easily via the NumPy module. Of course, to use NumPy - which handles linear algebra for you - effectively, you need to understand array-based data structures that store multi-dimensional numbers..
So... 2, I'm going back to Java again.

TMI...
Linear algebra... I have a personal history with it... literally decades ago... My SAT-equivalent score was oddly good... I graduated from the humanities track and was admitted to the Information and Communication department through a special-track admission. In the first semester of freshman year, the shock of this linear-algebra class sent me fleeing to the military. After discharge, I transferred majors...-,.- lol. If only NumPy had existed when I was in college... ㅜㅜ NumPy was created in 2005 ㅜㅜ.. what is this... I feel like a fossil... ㅡ,.ㅜ sniff..


So, in bits and pieces, I'm studying Jeongseok and NumPy, and kicking off my deep-learning study.


Machine learning, deep learning

  • Its foundation: regression (prediction) theory
    • Use collected historical data sets ([..]x = independent variables, y = dependent variable) to predict the future
    • If you can plot cause[x] and effect y on a graph
    • you can predict the future along the pattern (line) formed by those points.
    • The points on the graph,
      • continuous values (linear) = regression, e.g., predicting a student's future grades
      • discrete values = classification (logistic regression), e.g., predicting future success/failure, or multi-class prediction


Linear regression (a predictive line for predicting the future)

  • When there is 1 input value x
    • y = ax + b 
      • x = independent variable, y = dependent variable
    • (linear function, 2D = weighted-sum function**) ⇒ least-squares or gradient descent**
      • e.g., student's grade y varies based on [info..] x.
    • Predictive line: to properly express the linear function,
      • you need a (slope) and b (intercept, bias)
      • Use a (weight, importance) and b (threshold) to identify the pattern that determines y
      • Least-squares method (applicable only when there's only 1 independent variable x determining dependent variable y)
      • Formula to find the slope a that draws the most accurate straight line (predictive line)
    • slope = sum of the product of x's and y's deviations / sum of x's squared deviations (difference between each value and the mean)
      • a = sum of (x - x_mean)(y - y_mean) / sum of (x - x_mean)**2

 

  • When there are multiple input values x
  • Because the points on the graph aren't on a single line, when drawing the predictive line through them, you need to identify the error at each point and draw the line to pass as close as possible to as many points as possible
    • error = actual value - predicted value
    • size (sum) of errors = the errors can be positive or negative, so we 'square' them all
    • Mean Squared Error (evaluation method) MSE
      • Since you can't know the exact error at first, this is a method that evaluates, then improves (reduces) the error between an arbitrary (predicted) line and the actual value
        • Draw an arbitrary predictive line
        • Compare the actual value with the predictive line to evaluate error = actual - predicted = Y - Y_pred

Error graph

  • Properties of differentiation
    • For f(x)=a with a constant (unchanging) a, the derivative is 0
    • For f(x)=x, the derivative is 1
    • For f(x)=ax with constant a, the derivative is a
    • For f(x)=$x^a$ with natural number a, the derivative is $ax^{a-1}$


Gradient descent

  • A 'preprocessing step' to improve the evaluation of the predictive line for MSE
    • Substitute the predictive line y_pred into MSE
    • Y_pred = ax+b
    • $1/n\sum(Y-Ypred)^2$ → $1/n\sum(Y-(ax+b))^2$


Partial derivative (with a as a constant)

  • For $f(x_1, x_2) = x_1^2 + x_1x_2+a$
    • when multiple variables are in an expression, you don't differentiate all variables
    • you differentiate only the one variable you want and treat the others as constants
  • Partial-differentiate f(x1, x2)
    • $∂f/∂x=2x_1 + x_2$
  • Partial derivative with respect to a = ∂/∂a$MSE$$(a, b)$ = 2/n $\sum$ (ax+b-y)x = -(2/len(x)) * sum(x *(y - y_pred))
  • Partial derivative with respect to b = ∂/∂b$MSE$$(a, b)$ = 2/n$\sum$(ax+b-y) = -(2/len(x)) * sum(y - y_pred)



Hmm.. with the default Tistory settings, there's a limit to posts..
So, back to a Notion link
https://www.notion.so/thinknormal/41c12d969d8948da83d108e703fab227

 

Regression theory

Machine learning, deep learning

www.notion.so





And
related regression-theory
hands-on notes
Least-squares method (simple linear regression | 2D function)

 

Least-squares method (simple linear regression | 2D function)

Student grade dataset x = [2,4,6,8] y= [81,93,91,97]

www.notion.so

Gradient descent (simple linear regression | 2D function)

 

Gradient descent (simple linear regression | 2D function)

Student grade dataset x = [2,4,6,8] y= [81,93,91,97]

www.notion.so

Gradient descent (multiple regression | multi-dimensional function)

 

Gradient descent (multiple regression | multi-dimensional function)

Student grade dataset x1 = [2, 4, 6, 8] + x2 = [0, 4, 2, 3] y= [81, 93, 91, 97]

www.notion.so

Gradient descent (logistic regression | binary classification)

 

Gradient descent (logistic regression | binary classification)

Student grade dataset x = [2, 4, 6, 8, 10, 12, 14] y= [0, 0, 0, 1, 1, 1, 1] #0 = fail, 1 = pass

www.notion.so

 

 

This English version was translated by Claude.

친절한 찰쓰씨
Written by
친절한 찰쓰씨

Pleasant Charles — UI/UX researcher at AIT. Keeping notes on design, planning, and slow days here since 2010.

More on the author's page

Keep reading

Renewal

Steadily, for the long haul, without burning out

Mar 31, 2026·9 min
Renewal

Tech-life balance

Feb 7, 2026·3 min
Renewal

Humanality, by Park Jeong-ryeol

Feb 7, 2026·11 min