Understanding Audits
An audit is Docubat's core feature - an automated test that verifies whether AI models can successfully understand and implement your documentation. This page explains how audits work, what they test, and how to interpret results.
What is an Audit?
Think of an audit as a comprehensive test where we ask AI models to read your documentation and write working code. Each audit tests multiple combinations of:
- Programming Languages (Python, JavaScript, Java, etc.)
- AI Models (GPT-4, Claude, etc.)
- Your Documentation (APIs, SDKs, tutorials)
The goal is to identify where your documentation might be unclear, incomplete, or difficult for AI to interpret.
How Audits Work
The Audit Process
When you run an audit, here's what happens behind the scenes:
1. Documentation Processing
- Docubat fetches your documentation from the URLs you provided
- Content is processed and organized by programming language
- Documentation is optimized for AI consumption while preserving accuracy
2. Implementation Planning
For each programming language and AI model combination:
- The AI creates an implementation plan based on your task description
- Multiple attempts are made (up to 3 tries per combination)
- Each attempt learns from previous failures
3. Code Generation
- AI models write actual, executable code in the target programming language
- Code follows the task requirements and documentation guidelines
- Generated code includes proper error handling and best practices
4. Execution and Testing
- Generated code runs in secure, isolated cloud environments
- Tests execute with any authentication credentials you provided
- Results are captured, including output, errors, and execution logs
5. Results Analysis
- Actual output is compared against your expected output
- Code structure is validated against any specified requirements
- Success/failure is determined based on multiple criteria
What Makes an Audit Succeed?
An audit succeeds when:
- Functional Success: The generated code produces the expected output
- Structural Compliance: Code follows specified patterns and requirements
- Error-Free Execution: Code runs without critical errors
- Output Matching: Results match your defined success criteria
What Makes an Audit Fail?
Common failure reasons include:
- Documentation Gaps: Missing crucial information for implementation
- Ambiguous Instructions: Unclear or conflicting guidance
- Authentication Issues: Problems with API keys or access credentials
- Language-Specific Gaps: Missing examples for specific programming languages
- Outdated Information: Documentation that doesn't match current API behavior
Audit Configuration
Task Definition
The task description is the most critical part of your audit configuration:
Good Task Description:
Create a new user account using our API. The user should have a name,
email, and password. Return the created user's ID and handle any
validation errors appropriately.
Poor Task Description:
Use our API to create a user.
Expected Output
Define clear success criteria:
Specific Expected Output:
Successfully created user with returned user ID (numeric).
Error handling for duplicate emails and invalid passwords.
Proper HTTP status codes (201 for success, 400 for validation errors).
Vague Expected Output:
User creation works.
Programming Language Selection
Choose languages strategically:
- Start Small: Begin with 2-3 key languages
- Consider Your Audience: Focus on languages your developers actually use
- Test Officially Supported Languages: Ensure languages you officially support work well
AI Model Selection
Balance coverage with cost:
- Popular Models: Include models your users are likely to use
- Version Variety: Test both latest and slightly older model versions
- Cost Considerations: More models = higher cost but better coverage
Interpreting Results
Success Metrics
Audit results include several key metrics:
- Overall Success Rate: Percentage of language/model combinations that succeeded
- Language-Specific Success: How well each programming language performed
- Model-Specific Success: How different AI models performed
- Error Patterns: Common failure reasons across attempts
Detailed Trial Information
For each language/model combination, you'll see:
- Generated Code: The actual code the AI produced
- Execution Output: What happened when the code ran
- Error Messages: Any errors encountered during execution
- Token Usage: How many tokens the AI used (affects cost)
- Execution Time: How long the test took to run
Failure Analysis
When audits fail, look for patterns:
- Consistent Failures Across Languages: Likely a documentation issue
- Language-Specific Failures: Missing language-specific examples or guidance
- Model-Specific Failures: Some models may struggle with certain types of tasks
- Authentication Failures: Issues with API keys or access permissions
Improving Your Documentation
Common Issues and Solutions
Issue: Low Success Rates Across All Languages
Solution: Review your core documentation for clarity and completeness
Issue: Specific Language Always Fails
Solution: Add language-specific examples and installation instructions
Issue: Authentication Errors
Solution: Verify API keys and provide clearer authentication documentation
Issue: Code Structure Failures
Solution: Add code examples and explain expected patterns
Iterative Improvement Process
- Run Initial Audit: Get baseline results
- Identify Patterns: Look for common failure reasons
- Update Documentation: Make targeted improvements
- Re-run Audit: Test your improvements
- Repeat: Continue until you achieve acceptable success rates
Advanced Features
Scheduled Audits
Set up recurring audits to:
- Catch documentation drift over time
- Ensure new documentation changes don't break existing functionality
- Monitor how AI model improvements affect your results
Team Collaboration
- Shared Configurations: Team members can collaborate on audit setups
- Results Sharing: Share audit results with stakeholders
- Role-Based Access: Control who can view, edit, or run audits
Custom Validation
Advanced audit configurations can include:
- Code Structure Requirements: Specify patterns the generated code must follow
- Performance Criteria: Test not just functionality but performance
- Security Validation: Ensure generated code follows security best practices
Best Practices
Documentation Preparation
- Keep Documentation Current: Outdated docs lead to failed audits
- Provide Complete Examples: Include full, working code examples
- Test Documentation Manually: Ensure humans can follow your docs successfully
- Include Error Scenarios: Document what happens when things go wrong
Audit Design
- Start Simple: Begin with basic tasks before testing complex scenarios
- Test Incrementally: Build up complexity gradually
- Focus on User Journeys: Test the paths real developers will take
- Consider Edge Cases: Include both happy path and error scenarios
Result Analysis
- Look for Patterns: Don't focus on individual failures
- Consider Your Audience: Weight results based on your actual user base
- Track Trends: Monitor how results change over time
- Act on Results: Use audit feedback to actually improve documentation
Next Steps
- Review our Getting Started guide for step-by-step setup instructions
- Check out Pricing to understand audit costs
- Start with a simple audit to get familiar with the platform
Need help interpreting your audit results? Contact us at lets-get-jam@gmail.com.