Skill Evaluation
claw skill evaluate ab-test-analyzer
Score Breakdown
Safety
96
Executability
86
Completeness
82
Maintainability
84
Cost
88
Check Results
Evaluation Checks
Overall Score:85
input_validationPASS
Data format and sample sizes validated
statistical_correctnessPASS
Calculations verified against reference implementations
multiple_testing_correctionPASS
Bonferroni correction applied for multi-variant tests
sample_size_checkPASS
Warns when sample size is insufficient
visualizationPASS
Confidence interval plots generated correctly