Setup [2 weeks]
- Define evaluation protocol with @Aroraakhil and create experiment book for each task
- Based on sample data, estimate the volume of outgoing requests to GROQ. This will help us estimate budget.
- Setup API calls
Full experiments [8 weeks]


