As IT infrastructure demands grow, so does the increasing need to monitor and control the IT environment. Information security professionals are dealing with increasingly high rates of security-significant events within the SIEM infrastructures, which presents advanced challenges to both engineers and architects.

 

For example, let’s take a hypothetical SaaS provider whose in-house written applications generate close to 10 billion events per day (10G EPD).  We’ll assume that this organization already possesses some basic centralized log gathering capabilities (such as a dedicated syslog aggregator system or centralized event database).  In this example, we will base all the numbers and calculation on solutions developed by ArcSight™, including Logger, Connector Appliance and ESM (Enterprise Security Manager).

Is It Worth the Effort?

Drinking from a Fire HoseThe reasons for deploying SEM/SIEM solutions should be obvious. The advantages of SIEM solutions over basic log management are significant. As important as it is to maintain activity logs for critical systems and applications, simply storing them provides no capability to detect and prevent attacks or abuse of the infrastructure. SIEM solutions allow for the direct correlation of activity across all applications and platforms (including social sites and mobile devices) and provide powerful security response capabilities, such as the ability to identify new attacks or abuse vectors before they become a problem.

Log Centralization

Maintaining a centralized log repository is always a good first step. My personal experience is that even basic log gathering across the enterprise can present a strenuous internal effort. A log aggregation solution may normalize events by converting them into a common schema or it might be closer to a straight-text repository like a syslog. In any case, the conversion of the log records into a common SIEM vendor schema would be the next step towards securing your infrastructure.

Normalization

The level of effort that’s required to build the necessary "connectors" for your infrastructure would vary depending on the complexity of the initial record format. Hopefully the developer would have done the right thing and stored the events under a well-defined schema in an easily-accessible database. Such an approach will ease the efforts of mapping the developer's fields into schema used by the SIEM vendor, such as CEF.  However; things can quickly become difficult if, for example, inconsistently formatted multi-line text records are used. Another challenge for any connector developer is extensive database-driven applications, such as SAP or Oracle Applications. The queries required to pull basic event log data can get incredibly complex, and common custom development efforts add another layer of complexity to the task.

You can expect to be quoted a week of effort per application by ArcSight Professional Services or most of the VAR's. If most of our sample provider's apps follow similar (or the same) log formats with common field designation, only one custom 'connector' might be required, reducing development costs and providing more time for tuning and streamlining the log conversion. A dedicated server or an appliance can host multiple connectors, and there are connector management solutions available if the number of connectors demands it, (e.g. ArcSight's Connector Appliance).

Filtering and Aggregation

No self-respecting log management solution would work effectively without these two critical aspects. Depending on the repetitiveness and usefulness of the incoming event data, filtering and aggregation can deliver mixed results. A Unix system log is an example of how effective good filtering capabilities can be. We've seen Unix syslog filtered down to 5-10% of useful events (unless you're keen on analyzing sendmail events). Aggregation, on the other hand, works best on feeds like firewalls or misconfigured IDS, which can delver a high flow of very similar events. Firewall logs can be aggregated down to about 60-70%, depending on how liberal your egress firewall policies are. Overall, the fewer useful fields your event contains, the better your chances are for effective aggregation.

Performance

As far as raw performance is concerned, here are some known numbers that we can work with. ArcSight connectors can top out at about 1500 events per second (EPS) or ~130M EPD, which means that if we're talking about dramatically higher event rates, the data streams would have to be segregated based on some approach. The actual number of 'connector' instances used here presents a less significant challenge than how precisely the streams will be split. The choices for splitting the event streams can include a combination of geographical, platform, application groups, server groups, or other factors. The key here is how much data can be processed and correlated by an individual lower tier SIEM manager, because that would limit the scope of the correlation. We have seen ArcSight managers who are able to process up to 12K EPS (~1B EPD), which would consist of individual event records after aggregation and filtering.  Again, the reduction in event rates would mostly depend on the contents of the logs themselves and the process of segregating data streams among multiple lower tier managers. Individual scenarios can be discussed further in-depth if more information is available on application positioning.

Tiering

Correlated and other significant events would then be forwarded to the upper tier manager that would perform further correlation across the whole monitoring network. The upper-tier manager would also serve as the primary interface to the analysts. The amount of events processed by this manager would naturally depend on the correlation rules and other content built into the tasks performed by lower tier managers.

Retention

Retention requirements also have to be taken into consideration when dealing with high event rates. Your infrastructure may process billions of events per day, but if the compliance team mandates that they have to be stored for several years, you might be in trouble. An average ArcSight event, for example, takes about 750 Bytes in the ESM database. Which means that your 10G EPD will take ~0.7TB of disk space daily, and that's without accounting for redundancy. The key approach here would be to store the data in SIEM for a number of days your CIRT team needs practically (perhaps 2 - 4 weeks) and then rely on log management solutions for long-term storage. Of course, it never hurts to sit down with your audit/compliance team and finalize a very specific list detailing which events have to be retained for an extended amount of time and discard the rest.

Scalability

As the infrastructure's demands grow, planning for scalability would take into consideration how new servers and applications are deployed. It would meet the processing requirements by increasing the performance of, as well as growing the number of, the instances of connector applications as well as the instances of lower-tier managers.

Next Steps

Ultimately, the final details of the architecture in our example would depend on the answers to the following questions:

  • - What is the original log delivery mechanism used by the applications?
  • - What do the raw application logs look like? What information do they contain?
  • - What are the capabilities of the current log aggregation solution and can they be leveraged in SIEM architecture?
  • - How are the applications or the systems hosting these applications structured?
  • - What are the actual event rates per application, per platform, per geo location, and so on? What is the exact total daily event rate?
  • - What are some of the more urgent 'use cases' that the security management team wants to address?
  • - Is there any specific type of activity that should be in scope for monitoring, outside of the common use cases, such as exploitation, abuse, unauthorized access, privilege escalation and so on?

How about your own efforts, reader? Have you yourself ever had to address large event volumes? What were your biggest challenges? Let us know.

Until Next Time,

Anton G.