T5 (Text-to-Text Transfer Transformer) | KAITRUST AI 백과사전

📖 상세 설명

T5(Text-to-Text Transfer Transformer)는 Google Research가 2019년 발표한 대규모 언어 모델입니다. T5의 핵심 아이디어는 모든 NLP 작업을 텍스트-텍스트 형식으로 통일하는 것입니다. 예를 들어 번역은 "translate English to German: Hello" -> "Hallo", 감정 분석은 "sentiment: This movie is great" -> "positive"처럼 표현합니다. 이 통일된 접근법으로 하나의 모델이 수십 가지 NLP 작업을 수행할 수 있습니다.

T5는 "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer" 논문에서 소개되었습니다. Google 연구팀은 C4(Colossal Clean Crawled Corpus)라는 750GB 규모의 데이터셋을 구축하고, 다양한 사전 학습 방법, 모델 크기, 학습 전략을 체계적으로 비교 실험했습니다. 이 연구는 NLP 분야의 전이 학습 발전에 중요한 기여를 했습니다.

T5는 인코더-디코더 Transformer 아키텍처를 사용합니다. 입력 텍스트는 인코더에서 처리되고, 디코더가 출력 텍스트를 생성합니다. 사전 학습은 span corruption 방식으로 진행됩니다 - 입력 텍스트의 일부 구간을 마스킹하고 모델이 이를 예측하도록 학습합니다. 모델 크기는 Small(60M)부터 11B 파라미터까지 다양하며, 크기에 따라 성능이 향상됩니다.

T5는 LLM 연구의 중요한 이정표입니다. 이후 등장한 FLAN-T5, UL2, PaLM 등이 T5의 아이디어를 발전시켰습니다. 실무에서 T5는 요약, 번역, QA 시스템에 널리 사용되며, Hugging Face에서 쉽게 접근할 수 있습니다. 특히 파인튜닝 효율이 좋아 특정 도메인 작업에 적합합니다.

💻 코드 예제

Hugging Face Transformers를 사용한 T5 활용 예제입니다.

# T5 모델 활용 예제
from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch

# 모델 및 토크나이저 로드 (t5-small, t5-base, t5-large, t5-3b, t5-11b)
model_name = "t5-base"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

def generate_text(input_text: str, max_length: int = 128) -> str:
    """T5로 텍스트 생성"""
    input_ids = tokenizer.encode(input_text, return_tensors="pt")

    outputs = model.generate(
        input_ids,
        max_length=max_length,
        num_beams=4,
        early_stopping=True,
        no_repeat_ngram_size=2
    )

    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# ===== 1. 번역 (Translation) =====
print("=" * 50)
print("1. 번역 (Translation)")
print("=" * 50)

en_to_de = "translate English to German: The house is wonderful."
de_result = generate_text(en_to_de)
print(f"입력: {en_to_de}")
print(f"출력: {de_result}")

en_to_fr = "translate English to French: Machine learning is fascinating."
fr_result = generate_text(en_to_fr)
print(f"\n입력: {en_to_fr}")
print(f"출력: {fr_result}")

# ===== 2. 요약 (Summarization) =====
print("\n" + "=" * 50)
print("2. 요약 (Summarization)")
print("=" * 50)

article = """
summarize: The Amazon rainforest produces about 20% of the world's oxygen
and contains 10% of all species on Earth. Climate change and deforestation
pose serious threats to this vital ecosystem. Scientists warn that losing
the Amazon could accelerate global warming significantly. Conservation
efforts are critical for the planet's future.
"""
summary = generate_text(article, max_length=64)
print(f"원문: {article.strip()}")
print(f"요약: {summary}")

# ===== 3. 문법 교정 (Grammar Correction) =====
print("\n" + "=" * 50)
print("3. 문법 교정 (Grammar Correction)")
print("=" * 50)

# FLAN-T5 또는 파인튜닝된 모델에서 더 좋은 결과
grammar_input = "cola sentence: He don't knows nothing about it."
corrected = generate_text(grammar_input)
print(f"입력: {grammar_input}")
print(f"출력: {corrected}")

# ===== 4. 질문 답변 (Question Answering) =====
print("\n" + "=" * 50)
print("4. 질문 답변 (Question Answering)")
print("=" * 50)

qa_input = """
question: What is the capital of France?
context: France is a country in Western Europe. Its capital city is Paris,
which is known for the Eiffel Tower and the Louvre Museum.
"""
answer = generate_text(qa_input, max_length=32)
print(f"질문 & 컨텍스트: {qa_input.strip()}")
print(f"답변: {answer}")

# ===== 5. FLAN-T5 (향상된 버전) =====
print("\n" + "=" * 50)
print("5. FLAN-T5 (Instruction-tuned)")
print("=" * 50)

# FLAN-T5는 instruction following이 더 좋음
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

flan_model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-base")
flan_tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-base")

def flan_generate(prompt: str) -> str:
    inputs = flan_tokenizer(prompt, return_tensors="pt")
    outputs = flan_model.generate(**inputs, max_length=100)
    return flan_tokenizer.decode(outputs[0], skip_special_tokens=True)

instruction = "Explain what machine learning is in simple terms."
response = flan_generate(instruction)
print(f"질문: {instruction}")
print(f"답변: {response}")

📊 성능 & 비용

T5 모델 크기별 스펙

모델	파라미터	인코더/디코더	VRAM 요구량	주요 용도
T5-Small	60M	6/6 레이어	~1GB	실험, 엣지 디바이스
T5-Base	220M	12/12 레이어	~2GB	범용, 빠른 추론
T5-Large	770M	24/24 레이어	~4GB	고품질 요약/번역
T5-3B	3B	24/24 레이어	~12GB	프로덕션 NLP
T5-11B	11B	24/24 레이어	~45GB	연구/최고 성능

* FLAN-T5: Instruction 튜닝된 버전, 동일 크기에서 더 좋은 zero-shot 성능
* Hugging Face에서 무료 사용 가능, 상용 API 없음 (자체 호스팅 필요)

🗣️ 실무에서 이렇게 말하세요

💬 회의에서

"요약 모델 도입을 검토 중인데, T5-base로 시작하면 좋겠어요. 파라미터가 220M으로 가볍고, 한국어는 mT5 써야 하는데 품질이 영어보다 떨어질 수 있습니다. FLAN-T5가 instruction following이 더 좋으니 비교해보죠."

💬 면접에서

"T5의 핵심 혁신은 text-to-text 프레임워크입니다. 분류, 생성, QA 모두 같은 방식으로 처리하니까 멀티태스크 학습이 자연스럽습니다. GPT는 디코더만 쓰지만, T5는 인코더-디코더라서 조건부 생성 작업에 더 적합해요."

💬 기술 토론에서

"T5 파인튜닝할 때 prefix 방식 써보세요. 'summarize:', 'translate English to Korean:' 처럼 task prefix를 붙이면 하나의 모델로 여러 작업 처리할 수 있습니다. LoRA 적용하면 메모리도 절약되고요."

⚠️ 흔한 실수 & 주의사항

❌

한국어에 T5 사용
기본 T5는 영어 중심 학습되어 한국어 성능이 낮습니다. 한국어 작업에는 mT5(multilingual) 또는 KE-T5(한국어 특화) 모델을 사용하세요.

❌

T5를 채팅/대화에 사용
T5는 단일 턴 작업에 최적화되어 있습니다. 멀티턴 대화나 긴 컨텍스트 유지가 필요하면 GPT 계열이나 LLaMA가 더 적합합니다.

✅

올바른 접근법
1) 요약/번역/QA처럼 명확한 입출력 작업에 T5 활용, 2) FLAN-T5로 zero-shot 성능 확보, 3) 도메인 특화 시 파인튜닝, 4) 큰 모델 필요시 양자화(INT8) 적용.

🔗 관련 용어

📚 더 배우기

📝 T5 원본 논문 (arXiv) 📄 Hugging Face T5 문서 🎓 Google T5 GitHub 저장소