Feature Store | KAITRUST AI 백과사전

📖 상세 설명

Feature Store는 ML 워크플로우에서 피처(특성)를 중앙에서 관리하고, 학습과 서빙에 일관된 피처를 제공하는 시스템입니다. 피처 재사용, 학습-서빙 일관성(Training-Serving Skew 방지), 피처 버전 관리, 피처 디스커버리 등의 문제를 해결합니다.

Feature Store가 필요한 이유는 ML 프로젝트가 성숙해질수록 명확해집니다. 초기에는 Jupyter Notebook에서 피처를 직접 계산하지만, 모델 수가 늘어나면 중복 작업이 증가합니다. 예를 들어 "사용자의 최근 30일 구매 횟수" 피처는 추천, 이탈 예측, 사기 탐지 등 여러 모델에서 동일하게 필요합니다.

┌─────────────────────────────────────────────────────────────────────────────────┐ │ Feature Store 아키텍처 │ ├─────────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌───────────────────┐ │ │ │ Data Sources │ │ │ │ ┌─────┐ ┌─────┐ │ │ │ │ │ DB │ │Kafka│ │ ┌─────────────────────────────────────┐ │ │ │ └─────┘ └─────┘ │ │ Feature Store │ │ │ │ ┌─────┐ ┌─────┐ │ │ │ │ │ │ │ S3 │ │ API │ │ │ ┌─────────────────────────────┐ │ │ │ │ └─────┘ └─────┘ │───────▶│ │ Feature Registry │ │ │ │ └───────────────────┘ │ │ (메타데이터, 스키마, 버전) │ │ │ │ │ └─────────────────────────────┘ │ │ │ ┌───────────────────┐ │ │ │ │ │ │ Feature │ │ ▼ │ │ │ │ Engineering │───────▶│ ┌─────────────────────────────┐ │ │ │ │ Pipelines │ │ │ Transformation Engine │ │ │ │ │ (Spark, Flink) │ │ │ (피처 계산 로직) │ │ │ │ └───────────────────┘ │ └─────────────────────────────┘ │ │ │ │ │ │ │ │ │ ┌───────┴───────┐ │ │ │ │ ▼ ▼ │ │ │ │ ┌──────────┐ ┌──────────┐ │ │ │ │ │ Offline │ │ Online │ │ │ │ │ │ Store │ │ Store │ │ │ │ │ │(Parquet, │ │ (Redis, │ │ │ │ │ │ BigQuery)│ │ DynamoDB)│ │ │ │ │ └────┬─────┘ └────┬─────┘ │ │ │ └──────┼──────────────┼───────────────┘ │ │ │ │ │ │ ▼ ▼ │ │ ┌────────────┐ ┌────────────┐ │ │ │ Training │ │ Serving │ │ │ │ Pipeline │ │ (Real-time│ │ │ │ (Batch) │ │ Inference)│ │ │ └────────────┘ └────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────────┘

핵심 구성요소는 다음과 같습니다:

Offline Store: 학습용 대량 데이터 저장. Parquet, Delta Lake, BigQuery 같은 분석 저장소 사용. Point-in-time Join으로 과거 시점의 정확한 피처 조회.
Online Store: 실시간 추론용 최신 피처 저장. Redis, DynamoDB 같은 저지연 Key-Value 스토어 사용. p99 latency 10ms 이내 목표.
Feature Registry: 피처 메타데이터 관리. 스키마, 버전, 소유자, 태그, 의존성 정보 저장. 피처 디스커버리와 거버넌스 지원.
Transformation Engine: 피처 계산 로직 실행. 배치 파이프라인(Spark)과 스트리밍 파이프라인(Flink, Kafka Streams) 지원.

대표적인 솔루션으로 오픈소스 Feast, 상용 Tecton, 클라우드 서비스(Databricks, AWS SageMaker, Vertex AI Feature Store)가 있습니다.

🎯 핵심 가치

🔄 피처 재사용

한 번 정의된 피처를 여러 모델에서 공유. 중복 계산 제거로 비용 절감 및 일관성 보장.

🎯 Training-Serving 일관성

동일한 피처 정의에서 오프라인/온라인 피처 생성. Skew로 인한 모델 성능 저하 방지.

⏱️ Point-in-Time Join

과거 특정 시점의 피처값 정확히 조회. 데이터 누수(Data Leakage) 방지.

🔍 피처 디스커버리

조직 내 사용 가능한 피처 탐색. 카탈로그와 검색으로 재사용 촉진.

📋 버전 관리

피처 스키마 변경 이력 추적. 이전 버전으로 롤백 가능.

📊 모니터링

피처 분포 변화(Drift) 감지. 데이터 품질 문제 조기 발견.

📊 Feature Store 솔루션 비교

솔루션	유형	온라인 스토어	오프라인 스토어	스트리밍	특징
Feast	오픈소스	Redis, DynamoDB, etc.	파일, BigQuery, etc.	제한적	경량, 다양한 백엔드 지원
Tecton	상용	관리형	관리형	네이티브	엔터프라이즈급, 실시간 특화
Databricks FS	클라우드	Unity Catalog	Delta Lake	Spark 연동	Databricks 생태계 통합
SageMaker FS	클라우드	관리형	S3 Parquet	제한적	AWS 생태계 통합
Vertex AI FS	클라우드	Bigtable	BigQuery	Dataflow 연동	GCP 생태계 통합
Hopsworks	오픈소스/상용	RonDB	Hudi	네이티브	시계열 특화, Python SDK

💻 코드 예제

# Feast Feature Store 기본 설정 (Python)

# feature_repo/entities.py - Entity 정의
from feast import Entity

# Entity: 피처의 기준이 되는 키
user = Entity(
    name="user_id",
    description="Unique user identifier",
    join_keys=["user_id"]
)

product = Entity(
    name="product_id",
    description="Product SKU",
    join_keys=["product_id"]
)

# feature_repo/feature_views.py - Feature View 정의
from datetime import timedelta
from feast import FeatureView, Field, FileSource
from feast.types import Float32, Int64, String

# 데이터 소스 정의
user_stats_source = FileSource(
    path="data/user_stats.parquet",
    timestamp_field="event_timestamp",
    created_timestamp_column="created_timestamp"
)

# Feature View: 피처 그룹 정의
user_behavior_fv = FeatureView(
    name="user_behavior",
    entities=[user],
    ttl=timedelta(days=1),  # 온라인 스토어 TTL
    schema=[
        Field(name="purchase_count_7d", dtype=Int64),
        Field(name="purchase_count_30d", dtype=Int64),
        Field(name="total_amount_30d", dtype=Float32),
        Field(name="avg_order_value", dtype=Float32),
        Field(name="days_since_last_purchase", dtype=Int64),
        Field(name="favorite_category", dtype=String),
    ],
    source=user_stats_source,
    online=True,  # 온라인 스토어에 동기화
    tags={
        "team": "ml-platform",
        "domain": "user-behavior",
        "tier": "gold"
    }
)

# feature_store.yaml - 설정 파일
"""
project: ecommerce_ml
registry: s3://my-bucket/feast/registry.db
provider: aws
online_store:
  type: redis
  connection_string: redis-cluster.abc123.cache.amazonaws.com:6379
offline_store:
  type: file
entity_key_serialization_version: 2
"""

# 레지스트리에 적용
# CLI: feast apply

from feast import FeatureStore

fs = FeatureStore(repo_path="feature_repo/")

# 등록된 피처 확인
for fv in fs.list_feature_views():
    print(f"Feature View: {fv.name}")
    print(f"  Entities: {[e.name for e in fv.entities]}")
    print(f"  Features: {[f.name for f in fv.schema]}")

# 학습 데이터 생성 - Point-in-Time Join
from feast import FeatureStore
import pandas as pd
from datetime import datetime, timedelta

fs = FeatureStore(repo_path="feature_repo/")

# 학습할 레이블 데이터 (entity + timestamp + label)
entity_df = pd.DataFrame({
    "user_id": [1001, 1002, 1003, 1004, 1005],
    "event_timestamp": [
        datetime(2024, 1, 15, 10, 0),
        datetime(2024, 1, 15, 11, 30),
        datetime(2024, 1, 16, 9, 0),
        datetime(2024, 1, 16, 14, 0),
        datetime(2024, 1, 17, 8, 0),
    ],
    "churned": [0, 1, 0, 1, 0]  # 레이블
})

# Point-in-Time Join: 각 시점에 맞는 피처 조회
# 데이터 누수(Data Leakage) 자동 방지!
training_df = fs.get_historical_features(
    entity_df=entity_df,
    features=[
        "user_behavior:purchase_count_7d",
        "user_behavior:purchase_count_30d",
        "user_behavior:total_amount_30d",
        "user_behavior:avg_order_value",
        "user_behavior:days_since_last_purchase",
    ]
).to_df()

print("학습 데이터:")
print(training_df)

# 모델 학습
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

feature_cols = [
    "purchase_count_7d", "purchase_count_30d",
    "total_amount_30d", "avg_order_value",
    "days_since_last_purchase"
]

X = training_df[feature_cols].fillna(0)
y = training_df["churned"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)

# 온라인 스토어에 피처 동기화 (Materialization)
from datetime import datetime

fs.materialize(
    start_date=datetime(2024, 1, 1),
    end_date=datetime.now()
)
print("피처가 온라인 스토어에 동기화되었습니다.")

# 온라인 서빙 (실시간 추론)
online_features = fs.get_online_features(
    features=[
        "user_behavior:purchase_count_7d",
        "user_behavior:purchase_count_30d",
        "user_behavior:total_amount_30d",
        "user_behavior:avg_order_value",
        "user_behavior:days_since_last_purchase",
    ],
    entity_rows=[{"user_id": 1001}]
).to_dict()

# 예측
features_for_prediction = [
    online_features["purchase_count_7d"][0],
    online_features["purchase_count_30d"][0],
    online_features["total_amount_30d"][0],
    online_features["avg_order_value"][0],
    online_features["days_since_last_purchase"][0],
]
prediction = model.predict([features_for_prediction])
print(f"이탈 예측: {'이탈 위험' if prediction[0] == 1 else '정상'}")

# 스트리밍 피처 (실시간 피처 업데이트)
# Tecton 또는 Flink + Feast 통합 예제

# 방법 1: Push API를 이용한 실시간 피처 업데이트 (Feast)
from feast import FeatureStore, PushSource, FeatureView, Field
from feast.types import Float32, Int64
from datetime import timedelta

# Push Source 정의 (실시간 데이터 입력)
user_activity_push_source = PushSource(
    name="user_activity_push",
    batch_source=...,  # 배치 소스 (백필용)
)

# 실시간 Feature View
user_realtime_fv = FeatureView(
    name="user_realtime_activity",
    entities=["user_id"],
    ttl=timedelta(minutes=30),
    schema=[
        Field(name="clicks_last_5min", dtype=Int64),
        Field(name="page_views_last_5min", dtype=Int64),
        Field(name="cart_value", dtype=Float32),
    ],
    source=user_activity_push_source,
    online=True,
)

# 스트림 처리 후 Push
from feast import FeatureStore
import pandas as pd
from datetime import datetime

fs = FeatureStore(repo_path="feature_repo/")

# Kafka Consumer에서 이벤트 처리 후 피처 Push
def process_event(event):
    """Kafka 이벤트를 처리하여 Feature Store에 Push"""
    user_id = event["user_id"]

    # 집계 로직 (Redis 등에서 윈도우 집계)
    aggregated_features = aggregate_user_activity(user_id)

    feature_df = pd.DataFrame({
        "user_id": [user_id],
        "clicks_last_5min": [aggregated_features["clicks"]],
        "page_views_last_5min": [aggregated_features["page_views"]],
        "cart_value": [aggregated_features["cart_value"]],
        "event_timestamp": [datetime.now()],
    })

    # Online Store에 직접 Push
    fs.push(
        push_source_name="user_activity_push",
        df=feature_df,
        to="online"
    )

# 방법 2: Flink + Feature Store 통합 패턴
"""
Flink Job (PyFlink or Java):
1. Kafka에서 클릭스트림 데이터 소비
2. 윈도우 집계 (5분 Tumbling Window)
3. 집계 결과를 Redis/DynamoDB에 직접 Write
4. 또는 Feast Push API로 전송

┌──────────┐     ┌──────────┐     ┌──────────────┐     ┌──────────────┐
│  Kafka   │────▶│  Flink   │────▶│   Aggregator │────▶│ Online Store │
│ (Events) │     │  Window  │     │  (5min sum)  │     │   (Redis)    │
└──────────┘     └──────────┘     └──────────────┘     └──────────────┘
"""

# 방법 3: On-Demand Feature (요청 시점 계산)
from feast import on_demand_feature_view

@on_demand_feature_view(
    sources=[user_behavior_fv],
    schema=[Field(name="is_high_value_customer", dtype=bool)]
)
def compute_customer_tier(inputs: pd.DataFrame) -> pd.DataFrame:
    """요청 시점에 동적으로 계산되는 피처"""
    df = pd.DataFrame()
    df["is_high_value_customer"] = (
        inputs["total_amount_30d"] > 1000
    ) & (
        inputs["purchase_count_30d"] > 5
    )
    return df

# On-Demand 피처 사용
online_features = fs.get_online_features(
    features=[
        "user_behavior:total_amount_30d",
        "compute_customer_tier:is_high_value_customer",
    ],
    entity_rows=[{"user_id": 1001}]
).to_dict()

🗣️ 실무에서 이렇게 말하세요

💬 아키텍처 회의에서

"추천 모델이랑 이탈 예측 모델에서 같은 유저 피처를 각자 계산하고 있어요. Feature Store 도입하면 중복을 없애고 일관성도 보장할 수 있습니다. Feast로 시작해서 규모가 커지면 Tecton으로 마이그레이션하는 방향으로요."

💬 데이터엔지니어와 논의에서

"온라인 서빙할 때 지연 시간이 중요한데, Redis 온라인 스토어 쓰면 p99 기준 10ms 이내로 나옵니다. 배치 피처는 충분히 빠르고, 실시간 집계가 필요하면 Flink 파이프라인 추가해서 Push API로 연동하면 돼요."

💬 면접에서

"Training-Serving Skew는 학습과 서빙에서 피처 계산 로직이 달라 모델 성능이 저하되는 현상입니다. Feature Store를 사용하면 동일한 피처 정의에서 오프라인과 온라인 피처가 생성되어 일관성이 보장됩니다. 특히 Feast의 point-in-time join은 데이터 누수도 자동으로 방지해요."

💬 코드 리뷰에서

"이 피처 정의에 버전이 없네요. 피처 스키마가 변경되면 기존 모델과 호환성 문제가 생길 수 있어요. 피처명에 v2 붙이는 대신 Feast의 feature versioning 기능이나 태그를 활용하는 게 좋겠습니다."

💬 비용 최적화 회의에서

"피처 재사용률이 지금 30%밖에 안 돼요. Feature Store에 discovery 기능 강화하고, 팀별 피처 카탈로그를 공유하면 중복 계산 비용을 크게 줄일 수 있습니다. Spark 파이프라인 비용만 월 500만원 절감 예상됩니다."

⚠️ 흔한 실수 & 주의사항

❌

Materialization 모니터링 부재

온라인 스토어 동기화(Materialization)가 실패하면 서빙 피처가 오래됩니다. Airflow나 Dagster 등에서 파이프라인 모니터링과 알림을 반드시 설정하세요.

❌

피처 버전 관리 미흡

피처 스키마가 변경되면 기존 모델과 호환성 문제가 발생합니다. Breaking change는 새 피처명으로 정의하고, 기존 피처는 deprecated 태그로 관리하세요.

❌

TTL 설정 실수

온라인 스토어 TTL이 너무 짧으면 피처가 만료되어 null이 반환됩니다. 피처 freshness 요구사항을 분석하고, Materialization 주기보다 긴 TTL을 설정하세요.

❌

과도한 복잡성

모델 2-3개인데 Feature Store를 도입하면 오버엔지니어링입니다. 모델 10개 이상, 피처 재사용 니즈가 명확할 때 도입을 검토하세요.

✅

올바른 접근법

초기에는 파일 기반 오프라인 스토어로 시작하고, 실시간 서빙이 필요하면 Redis 온라인 스토어를 추가하세요. 피처 재사용을 위한 네이밍 컨벤션과 태그 체계를 먼저 정립하는 것이 중요합니다.

🔥 실제 사례로 배우기

📌 Uber - Michelangelo Feature Store 미국 | 대규모 ML 플랫폼 | Feature Store 선구자

Uber는 2017년 자체 ML 플랫폼 Michelangelo의 핵심 컴포넌트로 Feature Store를 개발했습니다. 수천 개의 ML 모델이 동일한 피처를 공유하며, 실시간 요청에 대해 ms 단위 지연 시간으로 피처를 제공합니다. Cassandra와 Hive를 기반으로 온라인/오프라인 스토어를 구축했습니다.

결과: 피처 재사용률 대폭 증가, ML 모델 개발 시간 50% 단축, Training-Serving Skew로 인한 버그 제거. 교훈: Feature Store는 ML 조직의 성숙도를 높이는 핵심 인프라입니다.

📌 Spotify - ML 피처 플랫폼 스웨덴 | 음악 스트리밍 | 개인화 추천

Spotify는 수억 사용자에게 개인화된 음악 추천을 제공하기 위해 자체 Feature Store를 운영합니다. 사용자 청취 이력, 플레이리스트 정보, 오디오 특성 등을 피처로 관리하며, GCP BigQuery와 Bigtable을 활용합니다. 특히 실시간 청취 컨텍스트 피처가 추천 품질에 핵심적입니다.

결과: 매주 수십억 건의 피처 서빙, 추천 클릭률 지속 개선. 교훈: 실시간 컨텍스트 피처(현재 듣는 곡, 시간대 등)가 배치 피처보다 더 큰 영향을 줄 수 있습니다.

📌 DoorDash - Fabricator Feature Store 미국 | 음식 배달 | 실시간 예측

DoorDash는 배달 시간 예측, 가격 최적화, 사기 탐지 등에 ML을 활용합니다. 자체 개발한 Fabricator Feature Store는 Redis 클러스터 기반 온라인 스토어에서 p50 5ms, p99 15ms 지연 시간을 달성합니다. 초당 수백만 건의 피처 요청을 처리합니다.

결과: 배달 시간 예측 정확도 향상, 실시간 가격 최적화로 주문 전환율 증가. 교훈: 음식 배달처럼 latency-sensitive한 도메인에서는 온라인 스토어 성능이 매우 중요합니다.

📝 이해도 퀴즈

Q1. Feature Store의 Online Store와 Offline Store의 주된 차이점은?

A) Online Store는 더 많은 데이터를 저장한다

B) Online Store는 실시간 저지연 서빙, Offline Store는 대량 배치 학습용

C) Online Store는 무료, Offline Store는 유료

D) 둘 다 동일한 저장소를 사용한다

✅ 정답: B) Online Store는 Redis/DynamoDB 같은 저지연 Key-Value 스토어를 사용하여 실시간 추론에 최신 피처를 제공합니다 (p99 10ms 이내 목표). Offline Store는 Parquet/BigQuery 같은 분석 저장소를 사용하여 대량 학습 데이터 생성에 활용됩니다.

Q2. Training-Serving Skew란 무엇이며 Feature Store가 어떻게 해결하나요?

A) 학습 데이터가 너무 커서 발생하는 메모리 문제

B) 학습과 서빙에서 피처 계산 로직이 달라 성능 저하되는 현상 - 동일 피처 정의로 해결

C) 모델 학습 시간이 오래 걸리는 문제

D) 서버 부하가 높아지는 현상

✅ 정답: B) Training-Serving Skew는 학습 시 Python으로 계산한 피처와 서빙 시 Java로 계산한 피처가 미묘하게 달라져 모델 성능이 저하되는 현상입니다. Feature Store는 동일한 피처 정의에서 오프라인과 온라인 피처를 모두 생성하여 이 문제를 해결합니다.

Q3. Point-in-Time Join이 중요한 이유는?

A) 조인 성능을 높이기 위해

B) 저장 공간을 절약하기 위해

C) 데이터 누수(Data Leakage)를 방지하여 과거 시점의 정확한 피처를 조회하기 위해

D) 실시간 서빙 속도를 높이기 위해

✅ 정답: C) Point-in-Time Join은 학습 데이터 생성 시 각 샘플의 타임스탬프 기준으로 그 시점에 존재했던 피처만 조회합니다. 이를 통해 미래 데이터가 학습에 포함되는 Data Leakage를 방지합니다. 예: 1월 15일 예측 시 1월 16일 데이터 사용 방지.

🔗 관련 용어

📚 더 배우기

📄 Feast 공식 문서 🎓 Tecton 문서 📝 Databricks Feature Store ☁️ AWS SageMaker Feature Store 🌐 Vertex AI Feature Store 📚 Hopsworks Feature Store 가이드