Over the next few months, I will be writing a series of blog posts to share my experience monitoring a large-scale SQL replication environment. I previously worked for a company where we replicated nearly 40,000 objects from 50 publishers to over 100 subscribers. Due to the scale of our SQL replication topology, the Replication Monitor tool did not meet our needs; it simply became overloaded when trying to aggregate metrics from 10 or so remote distributors. In fact, we asked our database engineers to stop using Replication Monitor and instead rely on our custom monitoring via Grafana and PowerShell.
Below are the topics that I will cover and I hope these are beneficial for you and your team in managing SQL replication:
- Tracking the Replication Log Reader Latency
- Monitoring the distribution database size
- Monitor and Analyze Replication Undistributed Commands
- Tracking Replication Errors
- Checking the Snapshot Agent Status
- Reviewing Distribution Cleanup job
- Bringing it all together – Replication Dashboard
- Addressing “missing row at subscriber” errors automatically via PowerShell
If you have any SQL replication challenges that are not covered here… I would love to hear from you. Maybe I already experienced that and I might have a solution to share. I helped managed a replication environment with 40k objects for a few years, I seen a lot of stuff.
Leave a Reply