Jon Aquino's Mental Garden

Engineering beautiful software jon aquino labs | personal blog

Sunday, May 10, 2026

Using codex to investigate PagerDuty alerts

 Codex CLI is so smart (gpt 5.5 medium, maybe better than Claude Code). I just told it to "get my PagerDuty incidents and find the root cause", and it:


- read the PagerDuty skill in our company skills repo

- used it to list my team's active incidents

- checked the missing S3 success files

- traced the alerts back through the Airflow DAGs

- queried live Airflow task state in prod

- checked S3 timestamps for upstream sludge success files

- used Trino to verify the alert

- concluded that the main issue was late sludge log completion delaying downstream metrics, while the alert was a real low-volume threshold miss, not missing data


Pretty amazing.

0 Comments:

Post a Comment

<< Home