Somehow turned forty,
and somehow got into AI.
Since BERT came out about 3~4 years ago, NLP has been growing at an almost scary pace.
Recently, T5 and Big Bird are drawing a lot of attention.
*Evolution of NLP models?Progression: (Vec2Vec → Seq2Seq) → BERT → XLNet→ RoBERTa → MT-DNN→ T5 → Big Bird
My recent personal interest
is testing how to apply Korean to these latest models, such as T5 and Big Bird.
Below is my first Korean application test.
The models used are word2vec, LexRank respectively.
1. gensim word2vec algorithm
- Algorithm summary
- Rather than indexing each word in order to vectorize it, this is a method of vectorizing words so that similar words share vectors of similar direction and magnitude.
- Once words are embedded via word2vec, you can even do arithmetic between words.
- Example code
- If you enter a URL, it will crawl the sentences on that page and summarize them → RUN !
Google Colaboratory Notebook
Run, share, and edit Python notebooks
colab.research.google.com
-> Exercise results
1) Target original URL
2) Original content
3) Summary (three types)
2. LexRank algorithm
- Algorithm summary
- Similar to TextRank, each sentence in the document is treated as a node, and similarity between sentences as edge weights, to build a graph.
- Once the graph is built, it applies PageRank to extract important sentences - it's an extraction-based document summarization algorithm.
- PageRank is the most fundamental algorithm behind Google's search engine.
- Example code
- Save any text into the variable, and it will summarize those sentences → RUN !
hangul_summarization_text.ipynb
Colaboratory notebook
colab.research.google.com
-> Exercise results
1) Target original URL
2) Result

3. Appendix: how to use Google Colab (Google Colaboratory Notebook)
1) Google Colaboratory Notebook is the same thing as a Python Jupyter notebook.
+ You'll notice there are a ton of advantages once you try it.
2) First, 1. Connect.
3) Then click each 2. Play button one by one.
4) Done ;D
Pretty easy, right?~
