Using codex to investigate PagerDuty alerts
Codex CLI is so smart (gpt 5.5 medium, maybe better than Claude Code). I just told it to "get my PagerDuty incidents and find the root cause", and it:
- read the PagerDuty skill in our company skills repo
- used it to list my team's active incidents
- checked the missing S3 success files
- traced the alerts back through the Airflow DAGs
- queried live Airflow task state in prod
- checked S3 timestamps for upstream sludge success files
- used Trino to verify the alert
- concluded that the main issue was late sludge log completion delaying downstream metrics, while the alert was a real low-volume threshold miss, not missing data
Pretty amazing.




0 Comments:
Post a Comment
<< Home