Observability
List of platforms in use by PUL IT and/or DLS
- HoneyBadger
- see also pulibrary/pul-it-handbook/services/honeybadger.md
- Used for error tracking per-project.
- Also used for some uptime monitors, though TODO not sure if those are working / on?
- DataDog
- see also pulibrary/pul-it-handbook/services/datadog.md
- Used for apm and log aggregation
- VM-level metrics with dashboards (especially network volume, latency)
- Service health metrics and dashboard, powered by data from our health endpoints
- CheckMK
- there’s also a
/staginginstance - see also pulibrary/pul-it-handbook/services/checkmk.md
- Monitoring and alerts, especially lower-level metrics
- there’s also a
- SigNoz
- Currently in trial implementation
- use “sign in with SSO”
- provides log aggregation and APM
- Grafana
- sign in with github
- Set up by DLS to quickly implement project-specific metrics
- Temporary home for some stats and dashboards that may move to signoz
- Runs on nomad
Scenarios
Are our prod sites up?
You can look at the little status page honeybadger builds for us.