출력 파서와 구조화 응답

LLM의 응답을 문자열이나 JSON 등 원하는 형식으로 변환하는 방법을 배웁니다

학습 목표

StrOutputParser로 AIMessage를 문자열로 변환한다
JSON OutputParser의 한계를 이해한다
Pydantic 모델과 with_structured_output으로 안정적인 구조화 응답을 받는다
프롬프트 설계로 출력을 최소화하는 방법을 익힌다

1. 왜 출력 파서가 필요한가?

LLM을 invoke하면 AIMessage 객체가 반환됩니다. 하지만 실제로 필요한 것은 그 안의 텍스트입니다.

python

response = llm.invoke("What is the capital of France?")
print(type(response))  # <class 'langchain_core.messages.ai.AIMessage'>
print(response.content)  # "The capital of France is Paris."

예를 들어, Python 서버에서 이 결과를 JavaScript 프론트엔드로 전달해야 한다면 AIMessage 클래스는 쓸 수 없습니다. 문자열로 변환해야 합니다.

2. StrOutputParser

2.1 AIMessage → string 변환

python

from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()

# LLM 호출
ai_message = llm.invoke("What is the capital of France?")

# 파서로 변환
answer = parser.invoke(ai_message)
print(type(answer))  # <class 'str'>
print(answer)         # "The capital of France is Paris."

2.2 핵심 값만 추출하기

"Paris"만 필요한데 장황한 답변이 오면? 프롬프트를 수정합니다:

python

from langchain_core.prompts import PromptTemplate

prompt = PromptTemplate(
    template="What is the capital of {country}? Return the name of the city only.",
    input_variables=["country"]
)

prompt_value = prompt.invoke({"country": "France"})
ai_message = llm.invoke(prompt_value)
answer = parser.invoke(ai_message)
print(answer)  # "Paris"

팁: 많은 상용 모델은 출력 토큰이 입력 토큰보다 더 비싼 편이다. 정확한 가격은 자주 바뀌므로, 필요한 정보만 받도록 프롬프트를 설계하고 최신 가격표는 제공사 문서를 확인하자.

3. JSON OutputParser의 한계

3.1 시도해보기

나라에 대한 정보를 JSON으로 받고 싶다면?

python

from langchain_core.output_parsers import JsonOutputParser

prompt = PromptTemplate(
    template="""Tell me about {country}.
Include capital, population, language, and currency.
Return as a JSON dictionary only.""",
    input_variables=["country"]
)

parser = JsonOutputParser()

prompt_value = prompt.invoke({"country": "France"})
ai_message = llm.invoke(prompt_value)
result = parser.invoke(ai_message)  # 에러 발생!

3.2 왜 에러가 나는가?

LLM의 응답은 항상 string입니다. JSON처럼 보여도 실제로는 문자열이라서:

\``json` 마크다운 코드블록이 붙기도 하고 안 붙기도 함
replace로 제거하면 본문의 "json" 단어까지 영향받음
매번 형식이 달라서 안정적인 파싱이 불가능

즉, 자유 생성된 텍스트를 JSON OutputParser에 바로 기대는 방식은 실전에서 불안정하다. 가능하면 모델이 구조화 출력을 직접 지원하는 경로를 우선 고려하는 편이 안전하다.

4. Pydantic + with_structured_output

4.1 Pydantic 모델 정의

Pydantic의 BaseModel로 원하는 출력 구조를 스키마로 정의합니다:

python

from pydantic import BaseModel, Field

class CountryDetail(BaseModel):
    capital: str = Field(description="수도 이름")
    population: int = Field(description="인구 수")
    language: str = Field(description="공용 언어")
    currency: str = Field(description="화폐 단위")

이 모델은 LLM에게 **"이 4개의 필드만 정확하게 반환해라"**고 지시하는 역할을 합니다.

4.2 with_structured_output 사용

python

from langchain_ollama import ChatOllama
from pydantic import BaseModel, Field

class CountryDetail(BaseModel):
    capital: str = Field(description="수도 이름")
    population: int = Field(description="인구 수")
    language: str = Field(description="공용 언어")
    currency: str = Field(description="화폐 단위")

llm = ChatOllama(model="llama3.2:1b")

# LLM에 Pydantic 모델을 연결
structured_llm = llm.with_structured_output(CountryDetail)

# 호출하면 Pydantic 모델 인스턴스가 반환됨
result = structured_llm.invoke("Tell me about France")

with_structured_output()은 특히 구조화 출력 또는 tool calling을 지원하는 모델에서 안정적으로 동작한다. ChatOllama도 지원하지만, 실제 성공 여부는 선택한 로컬 모델의 기능에 따라 달라질 수 있다.

4.3 결과 활용

python

# 필드 직접 접근
print(result.capital)      # "Paris"
print(result.population)   # 67390000
print(result.language)     # "French"
print(result.currency)     # "Euro"

# JSON(dict)으로 변환
json_data = result.model_dump()
print(json_data)
# {"capital": "Paris", "population": 67390000,
#  "language": "French", "currency": "Euro"}

# dict 접근
print(json_data["capital"])  # "Paris"

4.4 JSON OutputParser vs Pydantic 비교

항목	JSON OutputParser	Pydantic + with_structured_output
안정성	불안정 (형식이 오락가락)	안정적 (스키마 강제)
타입 검증	없음	자동 (int, str 등)
필드 접근	dict["key"]	result.field 또는 dict["key"]
JSON 변환	수동 파싱 필요	model_dump() 한 줄
실전 사용	제한적으로 가능하지만 취약	권장

5. 전체 흐름 비교

핵심 정리

StrOutputParser: AIMessage → 문자열 변환. 프론트엔드 연동에 유용
JSON OutputParser: LLM 응답의 형식이 불안정하여 실전에서 비권장
Pydantic + with_structured_output: 스키마를 정의하여 안정적으로 구조화 응답을 받는 권장 방법
model_dump()로 Pydantic 인스턴스를 dict/JSON으로 변환 가능
프롬프트에 "Return only..."를 추가하여 불필요한 출력을 줄이면 비용 절감

출력 파서와 구조화 응답 ​

학습 목표 ​

1. 왜 출력 파서가 필요한가? ​

2. StrOutputParser ​

2.1 AIMessage → string 변환 ​

2.2 핵심 값만 추출하기 ​

3. JSON OutputParser의 한계 ​

3.1 시도해보기 ​

3.2 왜 에러가 나는가? ​

4. Pydantic + with_structured_output ​

4.1 Pydantic 모델 정의 ​

4.2 with_structured_output 사용 ​

4.3 결과 활용 ​

4.4 JSON OutputParser vs Pydantic 비교 ​

5. 전체 흐름 비교 ​

핵심 정리 ​

출력 파서와 구조화 응답

학습 목표

1. 왜 출력 파서가 필요한가?

2. StrOutputParser

2.1 AIMessage → string 변환

2.2 핵심 값만 추출하기

3. JSON OutputParser의 한계

3.1 시도해보기

3.2 왜 에러가 나는가?

4. Pydantic + with_structured_output

4.1 Pydantic 모델 정의

4.2 with_structured_output 사용

4.3 결과 활용

4.4 JSON OutputParser vs Pydantic 비교

5. 전체 흐름 비교

핵심 정리