Scale your evaluations efficiently with batch processing capabilities
When you need to evaluate large volumes of AI interactions efficiently, Picept’s batch processing capabilities have you covered. Instead of making individual API calls, you can evaluate hundreds or thousands of LLM interactions in one go, analyzing everything from model outputs to conversation flows while maintaining comprehensive quality checks.
Simply pass lists of inputs in your dataset, and Picept automatically handles the batch processing. Here’s a complete example:
Copy
payload = { 'evaluation_name': "evaluation_name", # Optional name for your evaluation job 'dataset': { "prompt": [ "What is the scientific name of a domestic cat?", "What is the tallest building in New York City?" ], "response": [ "The scientific name of a domestic cat is Felis sylvestris...", "The tallest building in New York City is the Freedom Tower..." ], "reference": [ "Felis catus", "One World Trade Center, 1,776 feet" ], "context": [ "The domestic cat, scientifically named Felis catus...", "One World Trade Center (also known as Freedom Tower...)" ] }, 'evaluators': { "hallucination": { "prompt": "prompt", "response": "response", 'context': "context", "judge_model": "gpt-4o[openai]", "explanation": True, "passing_criteria": ["No hallucination (Strict)"] }, "factuality": { 'prompt': 'prompt', "reference": "reference", "response": "response", "judge_model": "gpt-4o-mini[openai]", "explanation": True, "passing_criteria": ["Response is a consistent subset of the reference"] } } }response = requests.post( "https://api.picept.ai/v1/evaluation", json=payload, headers={ "Authorization": f"Bearer {PICEPT_API_KEY}", "Content-Type": "application/json" })
We understand that different teams have different needs when it comes to handling their evaluation data. That’s why we’ve built multiple ways to feed your LLM interactions into Picept’s evaluation system.
Want to programmatically send your data? Our API makes it seamless. Just pass your arrays directly in the payload, and we’ll handle the rest. It’s perfect for when you’re working with smaller datasets or need tight integration with your existing systems. The best part? Our system automatically optimizes the batch size for you while keeping you updated on progress in real-time.
Got a massive dataset sitting in spreadsheets or JSON files? No problem. Our UI makes it incredibly simple to upload these files directly. Just drag and drop, and we’ll take care of mapping the columns correctly. It’s especially useful when you’re dealing with historical data or need to run one-off evaluations on large datasets. And don’t worry about the format - if it’s a standard CSV or JSON file, we’ve got you covered.
Here’s where things get really interesting. Connect Picept directly to your production systems, and you can evaluate your LLM interactions as they happen. With Picept, you can run evaluations on your chat completions directly from your production environment – no need to extract or transform your data. Stream your evaluation data in real-time, set up continuous monitoring, and get instant alerts if something goes wrong. Better yet, schedule monitoring jobs to automatically evaluate your production data at regular intervals, helping you maintain and improve your AI application’s reliability over time. It’s like having a quality assurance team working 24/7, constantly learning from real user interactions to ensure your AI systems maintain high standards.Think of it as having different gears in your evaluation engine - choose the one that best fits your speed and style. Whether you’re doing a quick test run with the API, analyzing months of historical data through file uploads, or leveraging your production data for continuous quality monitoring, we’ve built Picept to adapt to your workflow, not the other way around. You’re not just evaluating – you’re building a more reliable AI system that learns and improves from real-world usage patterns.
All batch evaluations are automatically logged in your Picept dashboard, providing a comprehensive view of your evaluation results and trends. The interactive dashboard turns your evaluation data into actionable insights:
The analytics dashboard offers:
Real-time progress tracking of ongoing evaluations
Detailed success/failure rates across different evaluator types
Interactive visualizations of evaluation trends
Comprehensive reports exportable in multiple formats
Team collaboration features for shared analysis
Custom alert configurations based on your metrics
Historical performance tracking and benchmarking
Each evaluation job gets its own detailed report page where you can: