Managing Drupal as a SAAS Offering
My Drupal site has been deployed, now what do I do?
Deploying a Drupal website is a pretty straight forward process. Typically, using SVN for example, you are committing to a repository that exists in your webroot from your development repositories. Once you've deployed, it's important to remember to flush your cache if you are using Drupal's built in performance measures. Depending on your website load, there may be some initial slow web server response as the cache is refilled. At Colocube, since most of our customers use extensive caching and all dynamic content, we've written some tools that will pre-warm the web server(s) so that end users don't feel sluggishness.
This process can be as simple as scripting a tool such as AB or Siege to run through your website liinks. For larger websites, having a validated sitemap will allow you to make sure that all sections and modules that have cachable information have been sufficiently warmed up. In our cases, we typically take out 1/2 of the available servers, update the content and warm them up, and then reinject them into the web proxy while taking the other half out to do the same process.
Moving beyond your deployment phase however can sometimes be seen as a sign of questionable transition. Clients typically raise the same sets of questions, such as "How do I gauge my website performance" or "When do I need to further boost my web server performance with upgrades" or even "How can I see what is reaching a limit". Thankfully, there are some very straight forward ways to measure these type of metrics using tools readily available for anyone to use.
For overall server performance, A useful tool that can give a detailed history of your performance is Cacti. There are other tools that can perform the same measurements as well as other functionality, however, Cacti is highly modular and is based on open standards with hundreds of plugins. At Colocube, we use Cacti extensively as a way of providing both dashboards to our customers, as well as a master dashboard to watch our extensive server farms. With Cacti, it's easy to see near real-time statistics about nearly every facet of server performance. For Drupal specifically, we monitor Apache, Mysql, and server OS loads. By matching peaks and trends over time, and being able to correlate these metrics to time-of-day visitor traffic, our customers are given an accurate window into their site and server performance and are able to accurately gauge when physical server performance as well as application performance is getting close to or exceeding the abilities of their environment.
Measuring server performance and application is important, however, it is equally important to be able to monitor website availability and to track when and why downtime occurs. Typically in a Drupal environment, although the same applies to other applications, the most common issue we run into is a module that is causing segmentation faults in Apache due to memory starvation. Less common but still a potential issue depending on the hardware deployed is swapping to disk due to lack of memory, or a process failing that causes 404 or 503 errors to be shown to site visitors.
At Colocube, we've made extensive use of an open source project called Opsview, a monitoring solution for just about any server and process you can imagine. By using Opsview, and making access to customers via their own portal to their services, all server issues can be tracked in real time, and corrective measures when warranted can be executed instantaneously.
For example, if Apache is still running but has seg faulted, we have a mechanism that uses an Opsview Agent to restart that Apache process gracefully. Downtime due Apache failure is able to be minimalized (typically less than 1 minute, Opsview scans all systems every minute). In another example, we track the total connections to Apache and are able to set soft thresholds that notify the customer and ourselves if their site traffic is continually pushing up against the hard limit for Apache processes for their website application. By having a system that can not only identify this threshold, but additionally alert externally as well as providing a histogram of repeated forays to the limit, Colocube and it's customers are able to make intelligent decisions on how a Drupal application is performing, and to accurately predict when additional resources will be required to further scale the application.
Putting these 2 key metric measurement systems to work for you provides a win-win situation for both Colocube and it's customer base. Too often, we consult with customers that in the past have been asked to pay ever increasing hosting fees for scaling their application and really taking a "throw more hardware and it will work" approach to scaling up their websites and applications.
By taking a step back, working first on optimizing the OS and the LAMP stack that supports Drupal and other applications we host, we then put in place a robust metrics monitoring system that both informs Colocube staff when there is an issue while at the same time providing clients a clear and concise measurement of their applications. In the process, we've been able to shift away from a wasteful and time intensive (as well as hardware and cash intensive) solution to a medium that allows and empowers our clients to make these important types of business decisions in a fully armed, fully informed mode of operation.
Most importantly, by using industry standard, open source tools, we empower our customers to "take a peek under the hood" if they so desire, and are able to operate in a full disclosure, open handed fashion which has allowed us to work more closely with our clients, while increasing the trust they have in both the Drupal stack and it's performance abilities, as well as Colocube's ability to accurately inform them on intelligent ways to expand their application in a measured and controlled fashion.

