In a recent blog post, Kevin Miller, Web Programming Specialist at California State University, Monterey Bay (CSUMB), mentioned how they monitor their various web services, including how they use Pingdom to keep things running. We took some time to talk to Kevin to find out more about CSUMB, their online services, and how they use Pingdom website monitoring.
Q: Kevin, can you start by giving us an overview of CSUMB?
A: Cal State Monterey Bay is the second-youngest campus in the California State University system. We have a variety of academic programs and focus on undergraduate education, community service, and democratic participation.
Unlike other larger or older universities, we have a unified web infrastructure maintained by a single department that provides development, operations, training, user experience, and project management under one roof. This means we can focus on large projects that impact every user on campus, which has translated to our users being heavily engaged with our various sites.
Q: How important are online and web services to a university like CSUMB?
A: Our campus intranet is actually the most heavily-trafficked site on campus, and most students login at least four times a day during the semester. Since this is the starting point for everything from email to finding homework, our users are extremely sensitive to downtime or degraded service.
Q: What kind of things do you monitor with Pingdom?
A: We are mostly monitoring pure up/downtime with Pingdom, plus a transaction test that walks through a complicated process of logging into our intranet and clicking around to different services that tests our single-sign-on system.
For example, we have a legacy system that tracks all our students’ dining cards that uses an old Windows COM object interface which we wrote a wrapper around to expose as a standard REST service that our intranet calls so our students can easily see how many meal blocks they have left. This involves a lot of different fragile interfaces and traversing several network zones, so transactions can give us a much better picture of whether the end-user is getting real data. With traditional monitoring we could see that these services were running, or whether our network interfaces were up, but we didn’t have end-user testing that essentially checked the entire chain.
Q: Overall, has monitoring your sites and servers helped you keep things running better, with as much uptime as possible?
A: Pingdom has helped us be more cognizant of downtime, especially on off hours when we do get notifications but don’t have dedicated staff. We have also posted our uptime on our IT alerts page, which helps our end-users know that we are monitoring their issues. It’s also been nice to tell leadership a solid percent uptime.
This kind of end-user monitoring has also helped us catch problems that might be on the edge of our network, beyond traditional tools like Nagios. While I can get very detailed information on our VM infrastructure, what our CPUs, memory, or processes look like, or whether or not a port was open, that doesn’t let you know if the user is actually getting a page. We had an issue eight months ago where our load balancers were randomly failing in a very esoteric way so that people on campus did not notice the downtime, but off-campus users did.
Q: Can you think of any specific example of a situation where Pingdom has alerted you that something has gone wrong and you’ve been able to fix it, perhaps before students or staff even noticed?
A: We had an incident a couple of months ago where I was sitting in our office and noticed a flash of red on our dashboard for our intranet. Luckily it was just a small misconfiguration that had been recently deployed and we could roll back immediately, so thankfully I could avoid my phone ringing off the hook.
Q: You mentioned in the blog post that you use our API to display your monitoring checks on an iPad with the Status Board app. What does the dashboard on the iPad actually look like? Perhaps we could even see an example.
A: We’re using a simple table widget for Status Board that we expose via a simple PHP Silex app. We went with this format because it meant the checks were nice and big, and the red indicator was easy to read (for this shot I actually faked a failed check, we weren’t experiencing problems at the time).
— You can read more about how to use Pingdom with the Status Board iPad app here. Editor’s note.
Q: Finally, do you have any suggestions for how we could make our services better?
A: I would like to see more in terms of simple transaction monitoring. For example, recording a user interaction with several different environments and having that run periodically would give us great feedback on end-user interactions that involve a lot of push state.
How do you use Pingdom?
We’d like to say a big thank you to Kevin for taking the time to talk to us for this interview. It’s great to see real life examples of how our customers use our services (often in ways that we’ve not even considered).
How do you use Pingdom? Perhaps you’re doing something really cool and different. Let us know, we’d very much like to hear from you. Use the comments below or email us at email@example.com.
California State University, Monterey Bay, has has more than 5,600 students divided between 23 undergraduate and eight graduate majors. As with other higher education institutions, online services are critical to the university’s various activities, including administration, teaching, and more.