A SolarWinds Scaling Strategy

Summary

Creating a robust infrastructure by-which to host SolarWinds instances is of vital importance. It has a direct impact on what can be monitored, as well as stake holder confidence in the resulting outputs. The latter being a key consideration that is often given a back seat.

A Bit of Background

A key component to building trust in a monitoring system though, is said system needs to reside on a trustworthy platform. It would be hard to have confidence in a tool when you’re not sure about the underlying mechanics and the impact on results. Kind of like depending on a structure built using crooked measuring sticks. Which is a tongue-in-cheek metaphor for the topic of this article. Identifying a scaling strategy for hosting SolarWinds that inspires confidence and trust in the results produced from network, systems, and applications monitoring.

The following article presents a review of methods and techniques used by our consulting team when formulating recommendations for a robust and trustworthy SolarWinds implementation. The vendor provides a lot of valuable information you should consider and review as an addendum to this article. Keep in mind, values presented are current as of the article’s posting, but may change frequently. The approach is the key, and the specific values should be reviewed for relevance before making recommendations of your own.

General Approach

  • List SolarWinds Modules that are Needed
  • Understand the Impact of Poller Boundaries
  • Determine the Database Deployment Strategy Using RTO and RPO Business Requirements
  • Account for HA and AWS
  • Decide on Resource Sizing for Each Host Type

Picking SolarWinds Modules

Start with the classification of monitoring types defined as networking, applications, or a combination of both. The basic question is “What kinds of things do I want to monitor?”. Put the results of those questions into the afore mentioned type classifications. Generally it is recommended to always include Network Performance Monitor (NPM) as a starting point for networking and Server and Application Monitor (SAM) for applications.

From this exercise, expand each classification type identified and select modules that include desired monitoring features.

For example:

  1. If type is equal to Networking:
    • NPM as a default
    • NCM if configuration management and backup features are required
    • NTA if Cisco NetFlow reporting is desired
  2. If type is equal to Applications:
    • SAM as a default
    • VMAN if advanced VMware infrastructure monitoring features are required
    • SRM if reporting on storage state is desired

There are of course a few outliers like DPA and LEM, but these modules are not part of the Orion Core and hence would be scoped externally to a standard SolarWinds instance(s).

Poller Boundaries

Jumping right in…

  • Every SolarWinds instance has a Main Polling Engine (MPE).
  • Additional Polling Engines (APE) can be added to increase objects monitored capacity.

With those considerations in mind, define Poller Boundaries. This term describes the boundaries, once reached, where default and custom polling thresholds are throttled programatically. This boundary is per polling engine, which means the MPE or APE. These are hard limits set by SolarWinds, that cannot be modified, nor can the throttling behavior be changed.

To have polling thresholds throttled unexpectedly will erode trust in your monitoring system. Hence the reason for defining Poller Boundaries.

The below chart lists several SolarWinds modules and their default boundaries. The “Max Limit” is the upper boundary for a single SolarWinds instance.

Keep in mind, these numbers are subject to change as each module is updated to newer versions. Always check the SolarWinds web site for current values.

Module Single Poller / Instance Measured By Max Limit
  NPM 12K   Elements (Node / Interface / Volume) 400K
  SAM 10K   Components (Process / Service / Counter) 150K
  NTA 50K   Flows per second 300K
  NCM 10K   Devices 30K
  Agents 1K   Per Polling Engine N/A
  SRM 40K   LUNs 160K
  UDT 100K   Ports 500K
  VMAN 3K   Virtual Machines N/A
  WPM 12   Recordings per Player Based on Complexity
  IPAM 1M   IPs 3M
  EOC 30   SolarWinds Instance N/A
  VNQM (1 of 2) 5K   IP SLA Operations per Hour 15K per day
  VNQM (2 of 2) 20K   Calls per Hour 200K per day

For each module you plan to use, estimate the number of objects that are anticipated to be monitored. Use the above chart to determine what each module considers a monitored object. Then divide that number by the Single Poller / Instance value found in the chart, and round up. Take the results and determine the number of polling engines needed.

An example using NPM where you have estimated a need to monitor 50k elements.

  • First, 50K is well within the single instance max limit of 400K.
  • The MPE counts as one polling engine.
  • You need an additional 4 APEs to make 5.

In summary, at a minimum, you would need 1 instance, 1 MPE which is included with each instance, and 4 APEs to meet your NPM monitoring requirements without crossing any Polling Boundaries. Keep in mind this is an article on scaling. There are scenarios requiring multiple instances even when a single instance is within Polling Boundaries. That however, would be an Architecture discussion, and is out of scope for this article.

Finally, perform the same basic steps for each identified module. If per say, SAM results in 4 polling engines needed, then the total polling engine count remains 5 to accommodate NPM.

Database

SolarWinds does not provide direct support for Microsoft SQL Server. It is required for the installation of SolarWinds, but it is not a product they officially support. This is often times a point of confusion and consternation. Make sure you have someone on staff who is qualified to handle the database side of things, because this also means you’ll need to support SQL Server High Availability solutions if you want failover and X-scaling capabilities.

Choose a configuration that meets your companies RTO and RPO business requirements. At a minimum you likely want to use Always On Basic Availability Groups, which replaces the deprecated SQL Server Database Mirroring feature. This is a complicated decision to make, but it is likely the most critical. SolarWinds is built entirely around the database and hence is extremely dependent on it.

Account for HA and AWS

If company RTO standards require continuous service during failure events, then SolarWinds High Availability Pools will be required. Like any X-scale solution based on duplication it means the host count is doubled. However, only one High Availability Pool license is required per pair. Technically this is getting into Architecture and Licensing, but it is an important consideration that crosses into the boundary of Scaling and deserved mention in this article.

Continuing with our NPM example. Five polling engines would mean 10 hosts (2 x MPE, 8 x APE) and 5 High Availability Pool licenses.

If you anticipate more then 20-25 concurrent users accessing the web console, then an Additional Web Server (AWS) is recommended. This is not a hard rule because much depends on the sophistication of configured dashboards. However, said recommendation is a good rule of thumb and is supported by SolarWinds.

System Resources (CPU, Memory, Storage, Network)

SolarWinds provides resource allocation guidelines based on Small, Medium, Large, Extra Large, and Amazon Web Services deployments. These guidelines are important and should be followed to the letter if at all possible. Recommendation is to classify the size of your deployment based on these guidelines and deploy the solution using resource quantities provided in the guide.

https://support.solarwinds.com/Success_Center/Orion_Platform/Orion_Documentation/Orion_Platform_Administrator_Guide/Orion_multi-module_system_guidelines

The guide explains resource quantities based on the number of modules plus licenses purchased. To make the conversion to Poller Boundaries, take the number of objects per module from the previous steps and apply them to the matching license quantity in the guide.

Using the NPM example again, 50k objects would require an NPM SLX license. In the guide this qualifies as a Large deployment.

The primary host types you’re sizing for include:

  • Main Polling Engine and Alerts Engine
  • Additional Polling Engine
  • Additional Web Server
  • Database Host

Wrapping Up

Using the above techniques, it should be possible to accurately describe the resources needed for building a SolarWinds instance that is ready to grow and scale with your company. Next steps might include taking this information and refining the solution further by architecting an infrastructure that takes into account other important factors such as latency between sites. That however, is a different topic all together.

Need more information on Scaling, Architecture or Licensing? Let’s talk! Monalytic is an authorized SolarWinds partner specializing in project-based services, managed services, training, licensing, and maintenance renewals.

 

Suggested Post – Sizing Your SolarWinds Log and Event Manager Appliance

Back to News