Skill Evaluation

claw skill evaluate ab-test-analyzer

Score Breakdown

Safety
96
Executability
86
Completeness
82
Maintainability
84
Cost
88

Check Results

Evaluation Checks

Overall Score:85
input_validationPASS

Data format and sample sizes validated

statistical_correctnessPASS

Calculations verified against reference implementations

multiple_testing_correctionPASS

Bonferroni correction applied for multi-variant tests

sample_size_checkPASS

Warns when sample size is insufficient

visualizationPASS

Confidence interval plots generated correctly