Core Evaluators
Master the art of AI evaluation with Picept’s comprehensive suite of evaluators
In today’s AI landscape, ensuring the quality, safety, and reliability of AI outputs isn’t just a technical requirement—it’s a business imperative. Picept’s evaluators go beyond simple checks, offering a sophisticated system that helps you build trust in your AI applications while maintaining compliance and quality standards.
One of the most powerful features across all our evaluators is the flexible judge model selection. You can choose from over 100 different models, ranging from lightweight options for rapid testing to sophisticated models for nuanced evaluation. Even better, you can use different judge models for different evaluator types in the same API call, optimizing for both performance and accuracy.
Let’s explore each evaluator and see how they can transform your AI quality assurance process.
Content Quality Evaluators
Hallucination Detection
Every AI system can occasionally generate information that isn’t supported by available context. Our hallucination detector helps you catch these instances before they impact your users.
When you enable explanation: True
, you get detailed insights into how the AI evaluator reached its conclusion. This isn’t just a pass/fail result—it’s a comprehensive analysis that helps you understand and improve your system’s performance.
Input Parameters:
prompt
: Original input promptresponse
: Model’s response to evaluatecontext
: Reference context for verificationjudge_model
: Choose from our extensive model libraryexplanation
: Get detailed reasoning when set to Truepassing_criteria
: Customize your strictness level
Content Safety
Modern AI systems need sophisticated safety measures. Our content safety evaluator doesn’t just flag issues—it helps you understand and address them comprehensively.
The criteria system is highly configurable, letting you focus on what matters most for your use case:
- Toxicity Detection: Identifies harmful or offensive content
- Bias Analysis: Helps ensure fair and balanced outputs
- NSFW Content: Maintains professional and appropriate content standards
- Topic Detection: Ensures content stays within expected domains
- Keyword Detection: Monitors for specific terms or phrases
- PII Detection: Here’s where Picept really shines. Beyond just identifying personal information, we can automatically replace it with realistic synthetic data. This means you can continue using the data for training and testing while maintaining privacy—a game-changer for building better AI systems.
Policy Adherence
In an era of increasing AI regulation, ensuring your AI outputs comply with policies is crucial. Our policy adherence evaluator helps you encode and enforce your guidelines systematically.
The policy parameter accepts detailed guidelines in natural language—no need to translate your policies into complex rules. The system understands and applies them intelligently.
Input Parameters:
response
: Content to evaluatejudge_model
: Select the most appropriate evaluator for your policiesPolicy
: Your guidelines in plain Englishexplanation
: Get detailed compliance analysis
Factuality Check
When accuracy matters—and it always does—our factuality checker ensures your AI system’s outputs align with known facts and reference materials. Unlike simple text comparison, this evaluator understands context and nuance, identifying both direct contradictions and subtle inconsistencies.
With explanation: True
, you get detailed insights into why something was flagged as factually incorrect, helping you pinpoint and address the root causes of inaccuracies.
Input Parameters:
prompt
: Original query for contextreference
: Your source of truthresponse
: Content to evaluatejudge_model
: Select from our model librarypassing_criteria
: Multiple levels of strictness available
Model Benchmark
Understanding how different models perform is crucial for optimizing your AI system. Our benchmark evaluator goes beyond basic metrics, providing detailed comparisons across multiple dimensions.
The real power comes from the ability to test multiple models simultaneously with the same input, making it easy to select the best model for your specific use case. Add a system prompt to standardize outputs and custom judge instructions for specialized evaluation criteria.
Input Parameters:
prompt
: Test inputsystem_prompt
: Guide model behaviorjudge_instruction
: Custom evaluation criteriamodels
: List of models to comparepassing_criteria
: Performance threshold
Sentiment Analysis
Understanding the emotional tone of AI outputs is crucial for maintaining appropriate user interactions. Our sentiment analyzer doesn’t just classify text as positive or negative—it provides nuanced insights into emotional undertones and potential impact.
Input Parameters:
input_text
: Content to analyzejudge_model
: Select for your specific needspassing_criteria
: Target sentimentexplanation
: Get detailed emotional analysis
Technical Validators
JSON/XML Validation
Data structure validation is critical for maintaining system integrity. Our validators don’t just check syntax—they ensure your structured data meets specific schema requirements and business rules.
What sets our validators apart is their ability to provide helpful context when validation fails, making debugging and fixes much faster. You can even specify reference schemas to ensure strict compliance with your data standards.
Input Parameters for JSON:
Input Parameters for XML:
Custom Evaluations
Sometimes you need evaluations tailored to your specific use case. Our custom prompt evaluator lets you define exactly what you want to assess, with the full power of our evaluation infrastructure behind it.
Input Parameters:
input_text
: Content to evaluatejudge_instruction
: Your custom evaluation criteriajudge_model
: Select the most appropriate modelpassing_criteria
: Define success conditions
Next Steps
Now that you understand the power and flexibility of Picept’s evaluators, you’re ready to:
- Set up your first evaluation pipeline
- Explore our playground environment
- Create custom evaluations for your specific needs
- Implement continuous monitoring of your AI systems
Check out our Batch Processing guide to learn how to scale your evaluations efficiently.