Guidelines for database and application sizing

This article provides general recommendations for sizing the database on a general level.

Database storage

The primary factors to consider when calculating for database size are the documents and the document-related features that are enabled in the database. However, below you can see a list of other factors that you should also consider:

Number of Users: The number of users affects storage needs in several ways, for example the use log size will increase per user.
Number of Cases: Expected number of cases.
Number of Documents: Expected number of documents in the archive.
Number of Processes: Expected number of processes.
Average size of Documents: Average size across all document types.
Free Text Index Enabled: If free text indexing is enabled, it would be 1 for no free text and 1.5 for text indexing.
Rendition activated on Documents: This indicates whether the PDF Crawler is generating PDFs for documents. PDF rendition doubles the storage need per document.
Version Settings enabled on Documents: If version settings are enabled, this will increase the storage needs as well.

Average Versions per Versioned Document: This can be 0 – 100. Estimate your average number of versions.
Percentage of Versioned Documents: Many documents will never be versioned, for example, archived documents from external sources. Therefore, you should estimate a percentage of documents that will be versioned.

Expected Yearly Growth: To prepare the database for growth, you can calculate the extends needed for an on-premises database on a yearly basis.

Example: Based on the above guidelines, the storage calculation could be done like this:

Node Type	Calculation	Storage required
Basic Document Storage	("Number of Documents" * "Average size of Documents")/1024	0 GB
Free text	"Basic Document Storage" * 0,5	0 GB
Renditions	"Basic Document Storage" * 1	0 GB
Versions	("Percentage of Versioned Documents" * "Number of Documents") * "Average Versions pr. Versioned Document" * "Average size of Documents" / 1024	0 GB
Meta Data Storage	"Number of cases" * 1kb + "Number of Processes" * 10kb + "Number of Documents" * 1 kb + "Number of Users" * 50 kb	0 GB
Total Storage		0 GB

Database CPU

The cost driver for databases is CPU. Calculating the CPU needs for a WorkZone database can be difficult due to the different workloads that you can activate in WorkZone.

The CPU output can differ based on the platform that you choose for running the database.

Important: Workflows with automation can have a different performance pattern on both the database and the application. Because of that, they should be calculated differently.

CPU Estimates for Oracle ADB

For Oracle ADB, the cost factor is ECPU. The good thing about the ADB ECPU is that it is scalable, allowing you to make a more dynamic setup and in a higher degree pay for what you use. For average workloads, the current rule of thumb is that we can service up to 2-300 users on 2 ECPU’s. Configuring that with auto-scaling will allow you to run with peak workloads as well.

You can then calculate the steps from the same type of workloads. 4 ECPU’s equals 4-600 users, and so forth. It is important to monitor the actual load, so you do not overprovision the CPU, but this should provide you with a general idea of the costs.

Note: One Oracle OCI OCPU equals 4 ECPUs. The minimum ECPU that can be used for an Oracle ADB instance, is 2 ECPUs.

CPU Estimates for Oracle EE and SE on-premises

For Oracle EE and SE, the cost factor is CPU. The challenge with this model is that you have to estimate the expected CPU usage to the roof, or expected max usage. Also, the CPU performance can differ based on the Oracle Platform in use. For average workloads on this model, the current rule of thumb is 1 CPU equals max 2 cores, approximately 50-100 users.

Note: One Oracle on-premises processor license equals 2 vCPUs for x86-based.

Application Kubernetes sizing guidelines

The sizing of your WorkZone Kubernetes instance is important for ensuring availability and scalability of your workload.

Installations that use the full master Helm chart including all WorkZone Pods have the following requirements for storage, CPU, and Memory.

Requirements for running all WorkZone Pods

Node Type	Number of Pods	Storage required*	CPU required*	Memory required*
Linux node(s)	8 WorkZone Pods, 72 Control Pods	200 GB	8 vCPUs	32 GiB
Windows node(s)	44 WorkZone Pods, 10 Control Pods	400 GB	12 vCPUs	48 GiB

*Numbers are approximate and guiding. They will fluctuate, as Pods change in size and requirements.

Requirements for scalable containers

A number of the containers are scalable. Depending on the workloads that are serviced by the Kubernetes instance, the pattern of scaling may differ. This could be workloads with intense automation causing the Process-related containers to require scaling, or workloads that generate many documents causing the PDF service to require scaling, or workloads with heavy integrations that require scaling of the OData service, and, finally, heavy user interaction causing the Client to require scaling.

The patterns for scaling containers have to be an assessment of the individual workloads, preferably by using a platform with auto-scaling capabilities and monitoring the behavior of the workloads.

For more detailing sizing, you have to assess the expected behavior of the instances and make a more granular calculation. See WorkZone containers.

Sizing guidelines for nodes

Depending on the purpose, workloads and availability of the WorkZone instance, there are different sizing guides.

Basic workloads (Dev and Test)

For Basic workloads for testing or development and no expectations of availability, use the Guidelines for database and application sizing table above.

High availability workloads (small production)

With workloads that require high availability, but have a small production workload, there is a need for being able to do, for example, rolling upgrades. Therefore, the sizing needs to consider the extra CPU and memory required for running two Pods in the upgrade flow. For such workloads the guidance is to double the requirements from the Guidelines for database and application sizing table above.

High scalability (medium to large production and workloads)

Calculating for scalability is difficult, you need to understand the workloads and which pods will require scaling. We suggest a simplified calculation model. A general recommendation is to monitor auto-scaling behavior and adjust accordingly to add more nodes to your cluster.

One model is to start with the numbers calculated for high availability above, make an assessment of the expected scaling, and then add sizing from the Requirements for scalable containers for the expected scale. This should provide you with adequate room for scaling.

Node strategy

Currently, WorkZone requires both Linux and Windows nodes. This puts requirements on the orchestration platform, as well as the node sizing strategy.

Depending on the expected use and workload of the WorkZone instance that you are deploying, the node strategy can vary. In general, fewer nodes lower the guaranteed availability, and more nodes provide more availability.

For dev and test workloads, you can argue node sizes and numbers as these are less critical and require less availability.
For production workloads, the recommendation is of minimum 3 nodes of each OS. Meaning 3 Linux and 3 Windows nodes.

You can use the sizing guidelines above to decide the collected node sizes.

Using the sizing guidelines, you can calculate the total needed storage, memory and CPU needed for both the Linux and Windows nodes. Considering the number of nodes, you should as a minimum allow that at least one node can fail based on your sizing.

Important: These are examples only. Actual recommendations for your setup can vary: you might need to run multiple WorkZone instances in the same cluster, or have other workloads than WorkZone in the same cluster, or require a different availability.