WSO2 Stream Processor Performance Analysis [Internship Project — 01]

Stream Processing Architecture

Real-time streaming data ingest is a common requirement for many big data use cases. In fields like IoT, e-commerce, security, communications, entertainment and finance, where so much depends on timely and accurate data-driven decision making, real-time data collection and analysis are in fact core to the business.

However, collecting, storing and processing streaming data in large volumes and at high velocity presents architectural challenges. An important first step in delivering real-time data analysis is ensuring that adequate network, compute, storage, and memory resources are available to capture fast data streams. But a company’s software stack must match the performance of its physical infrastructure. Otherwise, businesses will face a massive backlog of data, or worse, missing or incomplete data.

In order to fulfill above requirement, WSO2 has introduced latest version of WSO2 Stream Processor that can analyze data in a real time. You can find the latest version of stream processor by clicking this link and the documentation here

It has fairly good high event processing capacity, It helps us to understand impacts of events and map, identify patterns and react within milliseconds.

My first internship project is to analyze the performance of WSO2 Stream Processor. We have divided the testing into two phases. They are :

  • Event Ingestion with Persistence
  • Siddhi as a Library

Here we are going to discuss only Event Ingestion with Persistence performance analysis.

In order to perform persistence experiments, we have selected three data stores. They are :

  1. Microsoft SQL Server Event Store
  2. MySQL
  3. Oracle

Infrastructure used

  • c4.2xlarge (8 vCPU, 16GB RAM, EBS storage with 1000 Mbps max dedicated bandwidth) Amazon EC2 instance as the SP node
    * Linux kernel 4.44, java version “1.8.0_131”, JVM flags : -Xmx4g -Xms2g
  • db.m4.2xlarge (8 vCPU, 32 GB RAM, EBS-optimized storage with 100 Mbps max dedicated bandwidth) with MS SQL Enterprise Edition 2016 as the database node
  • Customized TCP client as the data publisher (Sample TCP client found in samples)

Scenario 1: Insert Query — Persisting 1250 million events of Process Monitoring Events on Microsoft SQL

This test involved persisting process monitoring events each of approximately 180 bytes. The test injected 1250 million events into Stream Processor with a publishing TPS of 55,000 events/second.

Throughput
Latency

Summary Results

+-----------+---------------------+
|Throughput | 55,000 events/second|
+-----------+---------------------+
|Latency | 45 milliseconds |
+-----------+---------------------+

Scenario 2: Update Query — Updating 10 million events on MS SQL Data store

The test injected 10 million events into the properly indexed MS SQL Database. 4 million update queries were performed with the publishing throughput as 1000 events / second.

Throughput
Latency

Summary Results

+---------------------------+---------------------+
|Throughput | 1000 events/second |
+---------------------------+---------------------+
|Latency | 4.5 milliseconds |
+---------------------------+---------------------+
|Number Of Persisted Events | 10 million |
+---------------------------+---------------------+

Oracle Event Store

Infrastructure used

  • c4.2xlarge (8 vCPU, 16GB RAM, EBS storage with 1000 Mbps max dedicated bandwidth) Amazon EC2 instances as the DAS node
    * Linux kernel 4.44, java version “1.8.0_131”, JVM flags : -Xmx4g -Xms2g
  • db.m4.2xlarge (8 vCPU, 32 GB RAM, EBS-optimized storage with 100 Mbps max dedicated bandwidth) Amazon RDS instance with Oracle as the database node
  • Customized TCP client as the data publisher (TCP producer found in samples)
  • Experiments were carried out using Oracle 12g.

Scenario 1 : Insert Query — Persisting 1488 million events of Process Monitoring Events on Oracle.

This test involved persisting process monitoring events each of approximately 180 bytes. The test injected 1488 million events into Stream Processor with a publishing TPS of 70,000 events/second.

Throughput
Latency

Summary Results

+-----------+---------------------+
|Throughput | 70,000 events/second|
+-----------+---------------------+
|Latency | 44 milliseconds |
+-----------+---------------------+

Scenario 2: Update Query — Updating 10 million events on Oracle Data store

The test injected 10 million events into the properly indexed Oracle Database. 75 million update queries were performed with the publishing throughput as 20,000 events / second.

Throughput
Latency

Summary Results

+---------------------------+---------------------+
|Throughput | 20,000 events/second|
+---------------------------+---------------------+
|Latency | 12 milliseconds |
+---------------------------+---------------------+
|Number Of Persisted Events | 10 million |
+---------------------------+---------------------+

Infrastructure Used

  • c4.2xlarge (8 vCPU, 16GB RAM, EBS storage with 1000 Mbps max dedicated bandwidth) Amazon EC2 instances as the SP node
    * Linux kernel 4.44, java version “1.8.0_131”, JVM flags : -Xmx4g -Xms2g
  • db.m4.2xlarge (8 vCPU, 32 GB RAM, EBS-optimized storage with 100 Mbps max dedicated bandwidth) Amazon RDS instance with MySQL Community Edition version 5.7 as the database node
  • Customized TCP client as the data publisher (TCP producer found in samples)
  • Experiments were carried out using 5.7.19 MySQL Community Server.

Scenario 1 : Insert Query — Persisting 18 million events of Process Monitoring Events on MySQL

This test involved persisting process monitoring events each of approximately 180 bytes. The test injected 18 million events into Stream Processor with the publishing throughput as 3400 events/second.

Throughput

After around 12.2 million events are published, a sudden drop can be observed in receiver performance that can be considered as the upper limit of MySQL event store. In order to continue receiving events without a major performance degradation data has to be purged periodically before it reaches the upper limit.

Scenario 2: Update Query — Updating 100K events on MySQL Data store

The test injected 100,000 events into the properly indexed My SQL Database. 3 million update queries were performed with the publishing throughput as 500 events / second.

Throughput

Summary Results

+---------------------------+---------------------+
|Throughput | 500 events/second |
+---------------------------+---------------------+
|Latency | 0.5 milliseconds |
+---------------------------+---------------------+
|Number Of Persisted Events | 0.1 million |
+---------------------------+---------------------+

The performance analysis test results indicate that Stream Processor’s event persistence scenarios’ performance has been characterized by the event store database performance. The results indicate that one could get significant performance gain from the use of SSD based storage as the event store database server.

We will discuss about siddhi as a library performance in the next blog :)

Software Engineer @ hSenid Mobile | Former Software Engineering Intern @ WSO2 | Software Engineering Undergraduate @ IIT