Picept provides a powerful evaluation system that allows you to run multiple evaluations with a single call. We offer both a Python SDK for seamless integration and direct API access for maximum flexibility. Let’s walk through a complete example that evaluates both factuality and hallucination detection.

Make sure you have your Picept API key ready. You can find this in your dashboard under Settings → API Keys.

Basic Implementation

There are two ways to integrate Picept evaluations: using our Python SDK or making direct API calls.

from picept import Client

client = Client(api_key="PICEPT_API_KEY") # Replace with your Picept API key

response = client.evaluations.create(
    evaluation_name="evaluation_name",  # optional name for your evaluation job 
    dataset={
        "prompt": "What is the scientific name of a domestic cat?",
        "response": "The scientific name of a domestic cat is Felis sylvestris. Unlike other cats, it developed from the ancient Mesopotamian cats (Felis mesopotamicus) around 8,000 years ago in the fertile crescent. According to recent DNA studies by Dr. Sarah Johnson at Oxford University, these early cats had unique genetic markers that we still see in modern house cats.",
        "reference": "Felis catus",
        "context": "The domestic cat, scientifically named Felis catus, is a small carnivorous mammal from the Felidae family. It traces its origins to the African wildcat (Felis lybica) and has been domesticated for thousands of years."
    },
    evaluators={
        "hallucination": {
            "prompt": "prompt",
            "response": "response",
            'context': "context",
            "judge_model": "gpt-4o[openai]",
            "explanation": True,
            "passing_criteria": ["No hallucination (Strict)"]
        },
        "factuality": {
            'prompt':'prompt',
            "reference": "reference",
            "response": "response",
            "judge_model": "gpt-4o-mini[openai]",
            "explanation": True,
            "passing_criteria": ["Response is a consistent subset of the reference"]
        }
    }   
)

print(response.json())

Understanding the Response

The API returns a detailed response for each evaluator:

{
  "factuality": {
    "passed": false,
    "explanation": "The response incorrectly states that the scientific name of a domestic cat is \"Felis sylvestris,\" whereas the reference confirms it is \"Felis catus.\"...",
    "prompt": "What is the scientific name of a domestic cat?",
    "response": "The scientific name of a domestic cat is Felis sylvestris...",
    "reference": "Felis catus"
  },
  "hallucination": {
    "passed": false,
    "explanation": "The response contains multiple factual claims that are not supported by the given context...",
    "prompt": "What is the scientific name of a domestic cat?",
    "response": "The scientific name of a domestic cat is Felis sylvestris...",
    "context": "The domestic cat, scientifically named Felis catus..."
  }
}

Integration Options

  1. Python SDK

    • Simplified integration with native Python objects
    • Automatic response handling
    • Built-in error handling and retries
    • Type hints and IDE support
  2. HTTP Request

    • Direct RESTful API access
    • Language-agnostic implementation
    • Fine-grained control over requests
    • Flexible integration options

Key Components

  1. Dataset Structure

    • prompt: The initial question or instruction
    • response: The AI model’s output to evaluate
    • reference: The correct answer (for factuality checks)
    • context: Additional information for context-aware evaluations
  2. Evaluator Configuration

    • Multiple evaluators can be specified in a single request
    • Each evaluator can use different judge models
    • Customize passing criteria for each evaluation
    • Enable detailed explanations for transparency
  3. Response Interpretation

    • passed: Boolean indicating if the evaluation criteria were met
    • explanation: Detailed reasoning for the evaluation result
    • Original inputs included for reference

Next Steps