logoBack to home screen

ADx Architecture

The ADx setup can be configured in many different ways. Each of the basic setups mentioned below covers different requirements.

This document provides setup and configuration examples - the actual configuration depends on specific requirements.

The ADx deployment is responsible for providing the UI and API related to documents, including operations such as create content, get content, search contents, etc. For details, check ADx REST API documentation.

The TF Conversion deployment is responsible for offering services regarding converting documents. This contains e.g. merge pdfs, split pdf, office to pdf,... . For details check: Available Conversion Jobs

The Messaging deployment is responsible for providing internal communication in case of a clustered environment.

Single Node Architecture Diagram - Setup 1

Setup 1 is intended to be used for POC, demo and development (against ADx API) purposes. All instances are on a single node. All data is stored in a single database and the file system is a single mount point. There is no focus on performance, scalability, fail-safety or backup strategies. The setup provides all the APIs available from ADx and TF conversion.

Setup 1 architecture diagram

Separate Single Node Architecture Diagram - Setup 2

Setup 2 is intended to be used for development and testing purposes. ADx, TF conversion and ActiveMQ are separated on single nodes. The communication (internal as well as external) is done via optional load balancers. The databases are separated (e.g. to use TF conversion for multiple ADx deployments or being flexible for developer teams to clean up their instances if necessary) as well as the file system. Since it is not intended for production there is no focus on scalability, fail-safety or backup strategies. The separation of ADx and TF conversion separates the CPU-intense conversion from the ADx repositories to size the nodes appropriately. The setup provides all the APIs available from ADx and TF conversion.

Setup 2 architecture diagram

Separate Clustered Nodes Architecture Diagram - Setup 3

Setup 3 is intended to be used for production purposes as well as for the latest test stage. ADx, TF conversion and ActiveMQ are separated on cluster nodes. The communication (internal as well as external) is done via load balancers. Each system All databases and file system are designed to achieve maximum value of separation, for optimal performance, scalability, fail-safety, backup strategies and responsibilities. Setup 3 architecture diagram

Communication Flow - Sequence Diagrams

Create Content

Create Content

Create Repository

Create Repository

Node Requirements

The node requirements are meant to be per node. In case of cluster usage the requirements are multiplied. The nodes itself can be either VMs or bare metal. Take in consideration that also the underlying operation system needs some GB of Memory.

Node requirements - ADx

SetupCPUMemoryNetworkOperating System
Setup 1 *4 CPU12 GBGBit or betterLinux - Kernel 3.10 or newer
Setup 24 CPU12 GBGBit or betterLinux - Kernel 3.10 or newer
Setup 38 CPU32 GBGBit or betterLinux - Kernel 3.10 or newer

*shared on one node

Node requirements - TF conversion

SetupCPUMemoryNetworkOperating System
Setup 1 *----
Setup 24 CPU12 GBGBit or betterLinux - Kernel 3.10 or newer
Setup 38 CPU32 GBGBit or betterLinux - Kernel 3.10 or newer

*shared on one node

Backup

The recommendations on Backup are grouped into 5 categories. This holds for database as well as for file system data.

CategoryDescription
Priority 1This category holds the most critical data. This includes business data (e.g. documents (content) stored in repositories) as well as configuration of ADx/TF conversion itself (e.g. user information or configuration of the repository). Loosing this kind of data means data loss! It is recommended to backup this category regurlarly (e.g. daily) to allow disaster recovery.
Priority 2This category holds data that can be restored based on the information of priority 1 but is time consuming. This includes e.g. representations of documents (content). It is recommended to backup this regularly with longer time slots (e.g. every 2 days / weekly)
Priority 3This category holds data that is not relevant for business users but from an admin perspective. This includes e.g. log files. It is recommended to backup this regularly with longer time slots (e.g. every 2 days / weekly)
Priority 4This category holds data which can be recovered by redeployment or is transient data. There is no need of backups.
Priority externalADx can access external repositories (e.g. Documentum) to have a normalized view on documents. The external repository needs to be backuped independently of ADx.

Operating system settings

  • Number of concurrently open file descriptors:
DeploymentValue
ADx500000
TF conversion100000
Messaging10000
  • Random generator: Since servers have low entropy use a service is necessary in enrich it.
DeploymentLinkVersion
ADxhaveged1.9.2+
TF conversionhaveged1.9.2+
Messaginghaveged1.9.2+
  • Command line tools:
DeploymentLinkVersion
ADxcurl7.29+
TF conversioncurl7.29+
TF conversiontesseract4.0.0+
TF conversionwkhtmltopdf0.12.5+
Messagingcurl7.29+

Load Balancer

There are no special requirements for the load balancer. Neither the algorithm nor the session handling (sticky or not) has any special requirements. Since e.g. a conversion health check is synchronous for obvious reasons it is recommended to have a session timeout ≥ 10min.

Firewall settings

On each node (ADx/Conversion) the configured ports (see installation details) need to be opened:

  • The external communication (https for security reasons) which provides the API to users, developers, and other systems must be available via the load balancers (or node in case there is no load balancer).
  • Internally, the nodes need to open the ports for messaging in case of a clustered setup. Multicast must be allowed in case of automatic service discovery (optional).

Java requirements

Java Version: OpenJDK or Oracle JDK 8+

Java in general supports more than 32GB heap size for one JVM but it needs to be considered that above 32GB Java uses 64bit references, which uses more memory by itself. When deciding to exeed the 32GB boundary, consider that is is necessary to increase the memory dramatically to have a similar heap available. In practice, this means that when increasing the memory above 32GB it is necessary to go over 40GB. See http://java-performance.info/over-32g-heap-java/ for a detailed explaination.

To avoid heap resizing during uptime of the servers which leads to performance issues -Xmx and -Xms should be equal.

Based on the available memory settings choosen in the setup configuration above make sure that the operating system does not needs to cache because of memory limitations. The rest should then be assigned to ADx, TF conversion and Messaging.

Java settings - ADx

SettingDeployment ParameterValuedescription
-XmxmaxHeapSize 4GB - 31GB Maximum heap size
-XmsinitialHeapSize 4GB - 31GBMinimum heap size

Java settings - TF conversion

SettingDeployment ParameterValuedescription
-XmxmaxHeapSize4GB - 31GB Maximum heap size
-XmsinitialHeapSize4GB - 31GBMinimum heap size

Runtime

The runtime of ADx and TF conversion is Apache Tomcat (http://tomcat.apache.org/).

Runtime - ADx

SettingDeployment ParameterValueDefaultDescription
http Port httpPortany available port8080expose to the outside (Attention, not encrypted!)
https PorthttpsPortany available port8443expose to the outside
AJP PortajpPortany available port8009only internal usage - do not expose to the outside
server PortserverPortany available port8005only internal usage - do not expose to the outside
Maximum ConnectionsmaxConnectionsa positive number or no limit (recommended)-1 (no limit)maximum number of connections
Maximum ThreadsmaxThreads≥10004000maximum number of request worker

Runtime - TF Conversion

SettingDeployment ParameterValueDefaultDescription
http Port httpPortany available port8080expose to the outside (Attention, not encrypted!)
https PorthttpsPortany available port8443expose to the outside
AJP PortajpPortany available port8009only internal usage - do not expose to the outside
server PortserverPortany available port8005only internal usage - do not expose to the outside
Maximum ConnectionsmaxConnectionsa positive number or no limit (recommended)-1 (no limit)maximum number of connections
Maximum ThreadsmaxThreads≥10004000maximum number of request worker

Mount Points

Since sizing of documents cannot be predicted it is recommended to have a file system with dynamic grow capabilities (e.g. XFS or ZFS)

Mount Points - ADx

Mount PointDeployment ParameterSizeCommentDynamic sizingBackupType
ADx installation directoryinstallationPath5GBholds ADx installationnoPriority 4local
log fileslogFilesDirmultiple GB - depending on log configurationholds log filesyesPriority 3local
temp filestempDirmultiple GB - depending on loadholds temprary filesyesPriority 4local
content files per repositoryconfigurable via UI during creation of the repositoryDepending on number of documentsholds content filesyesPriority 1shared*
fulltext indexELASTIC_SERVICE_DATA_PATHDepending on number of documentsholds elasticsearch fulltext indexyesPriority 1shared*

*if used, shared between ADx nodes

Mount Points - TF conversion

Mount PointDeployment ParameterSizeCommentDynamic sizingBackupCommentType
TF conversion installation directoryinstallationPath5GBholds log filesnoPriority 4local
log fileslogFilesDirmultiple GB - depending on log configurationholds log filesyesPriority 3local
temp filestempDirmultiple GB - depending on loadholds temprary filesyesPriority 4local
conversion job filesCONV_STORAGE_FOLDERDepending on the number of conversion requestsholds content filesyesPriority 4only used if CONV_STORAGE_TYPE=fsshared*

*if used, shared between TF conversion nodes

Databases

Databases store the metadata and information for the different systems. ADx, for example, needs a database to store information about a document getting checked out, the name of the document or its history. This information goes into the database. Similar principle applies to Conversion. In addition the system databases are used for storing user information, session, locks, and so on. A relational database can handle this quite well - therefore it is used.

Database Sharing

It's possible to share databases in the configurations described below.

Shared databaseImportant information
System database for ADx and ConversionUsers, sessions, and other system data is shared between ADx and Conversion in this setup.
Single database for Conversion system and access dataThis setup is possible provided you don't need to back up your conversion system data. In practice it means that you can share the database provided you don't have any special conversion users.
Single database for Conversion access data and ADx Cache dataThis setup is possible but not recommended. Conversion access data is transient; ADx cache data is also a kind of transient, but it is really time consuming to recreate - it requires regular backups.

Database - ADx

ADx needs one SQL database for system-related data (such as sessions, user accounts or repository configuration).

ADx also needs SQL databases for each repository. For tribefire repository two databases are necessary - cache and content. For external repositories (e.g. Documentum, CMIS) one database is necessary - cache.

SQL Database - ADx

UsageDescriptionBackupType
System databasecontains user information, repository configuration, user sessions, locking information, leadership informationPriority 1shared between all ADx nodes, optionally shared with TF conversion and/or Messaging
Cache databasecontains cache/representations of content - can be recreated in a time consuming processPriority 2shared between all ADx
Content databasecontains business dataPriority 1shared between all ADx
SQL Database - ADx - PostgreSQL

System database:

RequirementsParameterDescription
Version---11 or newer
ConnectionPool size User Session - minUSER_SESSIONS_DB.minPoolSize* 0
ConnectionPool size User Session - maxUSER_SESSIONS_DB.maxPoolSize*20
ConnectionPool size User Statistics - minUSER_SESSION_STATISTICS_DB.minPoolSize* 0
ConnectionPool size User Statistics - maxUSER_SESSION_STATISTICS_DB.maxPoolSize*10
ConnectionPool size Authorization - minAUTH_DB.minPoolSize* 0
ConnectionPool size Authorization - maxAUTH_DB.maxPoolSize*10
ConnectionPool size Locking - minLOCKING_DB.minPoolSize* 0
ConnectionPool size Locking - maxLOCKING_DB.maxPoolSize*5
ConnectionPool size LEADERSHIP_DB - minLEADERSHIP_DB.minPoolSize* 0
ConnectionPool size LEADERSHIP_DB - maxLEADERSHIP_DB.maxPoolSize*5

*or the default configuration of DEFAULT_DB - the sum of the maximum connections of each connection pool is the required number of connections from the database

Cache database:

RequirementsDescription
Version11 or newer
ConnectionPool size min10
ConnectionPool size max150

Content database:

RequirementsDescription
Version11 or newer
ConnectionPool size min10
ConnectionPool size max150
SQL Database - ADx - Oracle

System database:

RequirementsParameterDescription
Version---11g or newer
ConnectionPool size User Session - minUSER_SESSIONS_DB.minPoolSize* 0
ConnectionPool size User Session - maxUSER_SESSIONS_DB.maxPoolSize*20
ConnectionPool size User Statistics - minUSER_SESSION_STATISTICS_DB.minPoolSize* 0
ConnectionPool size User Statistics - maxUSER_SESSION_STATISTICS_DB.maxPoolSize*10
ConnectionPool size Authorization - minAUTH_DB.minPoolSize* 0
ConnectionPool size Authorization - maxAUTH_DB.maxPoolSize*10
ConnectionPool size Locking - minLOCKING_DB.minPoolSize* 0
ConnectionPool size Locking - maxLOCKING_DB.maxPoolSize*5
ConnectionPool size LEADERSHIP_DB - minLEADERSHIP_DB.minPoolSize* 0
ConnectionPool size LEADERSHIP_DB - maxLEADERSHIP_DB.maxPoolSize*5

*or the default configuration of DEFAULT_DB - the sum of the maximum connections of each connection pool is the required number of connections from the database

Cache database:

RequirementsDescription
Version11g or newer
ConnectionPool size min10
ConnectionPool size max150

Content database:

RequirementsDescription
Version11g or newer
ConnectionPool size min10
ConnectionPool size max150
SQL Database - ADx - MSSQL

System database:

RequirementsParameterDescription
Version---12 or newer
ConnectionPool size User Session - minUSER_SESSIONS_DB.minPoolSize* 0
ConnectionPool size User Session - maxUSER_SESSIONS_DB.maxPoolSize*20
ConnectionPool size User Statistics - minUSER_SESSION_STATISTICS_DB.minPoolSize* 0
ConnectionPool size User Statistics - maxUSER_SESSION_STATISTICS_DB.maxPoolSize*10
ConnectionPool size Authorization - minAUTH_DB.minPoolSize* 0
ConnectionPool size Authorization - maxAUTH_DB.maxPoolSize*10
ConnectionPool size Locking - minLOCKING_DB.minPoolSize* 0
ConnectionPool size Locking - maxLOCKING_DB.maxPoolSize*5
ConnectionPool size LEADERSHIP_DB - minLEADERSHIP_DB.minPoolSize* 0
ConnectionPool size LEADERSHIP_DB - maxLEADERSHIP_DB.maxPoolSize*5

*or the default configuration of DEFAULT_DB - the sum of the maximum connections of each connection pool is the required number of connections from the database

Cache database:

RequirementsDescription
Version12 or newer
ConnectionPool size min10
ConnectionPool size max150

Content database:

RequirementsDescription
Version12 or newer
ConnectionPool size min10
ConnectionPool size max150

Database - TF conversion

TF conversion needs one SQL database for system-related data (such as sessions or user accounts).

TF conversion needs a SQL database for storing the job information.

In addition a propietary database is used on all nodes. This database holds the configuration of TF conversion.

SQL Database - TF conversion

TypeDescriptionBackupType
System databasecontains user information, repository configuration, user sessions, locking information, leadership informationPriority 1shared between all TF conversion nodes, optionally shared with ADx and/or Messaging
Conversion databasecontains transient dataPriority 4shared between all TF conversion nodes
SQL Database - TF conversion - PostgreSQL

System database:

RequirementsParameterDescription
Version---11 or newer
ConnectionPool size User Session - minUSER_SESSIONS_DB.minPoolSize* 0
ConnectionPool size User Session - maxUSER_SESSIONS_DB.maxPoolSize*20
ConnectionPool size User Statistics - minUSER_SESSION_STATISTICS_DB.minPoolSize* 0
ConnectionPool size User Statistics - maxUSER_SESSION_STATISTICS_DB.maxPoolSize*10
ConnectionPool size Authorization - minAUTH_DB.minPoolSize* 0
ConnectionPool size Authorization - maxAUTH_DB.maxPoolSize*10
ConnectionPool size Locking - minLOCKING_DB.minPoolSize* 0
ConnectionPool size Locking - maxLOCKING_DB.maxPoolSize*5
ConnectionPool size LEADERSHIP_DB - minLEADERSHIP_DB.minPoolSize* 0
ConnectionPool size LEADERSHIP_DB - maxLEADERSHIP_DB.maxPoolSize*5

*or the default configuration of DEFAULT_DB - the sum of the maximum connections of each connection pool is the required number of connections from the database

Conversion Job database:

RequirementsDescription
Version11 or newer
ConnectionPool size min10
ConnectionPool size max150

If CONV_STORAGE_TYPE=db then the conversion files will be stored in database. This needs to be considered in case of sizing/caching configuration of the database.

SQL Database - TF conversion - Oracle

System database:

RequirementsParameterDescription
Version---11g or newer
ConnectionPool size User Session - minUSER_SESSIONS_DB.minPoolSize* 0
ConnectionPool size User Session - maxUSER_SESSIONS_DB.maxPoolSize*20
ConnectionPool size User Statistics - minUSER_SESSION_STATISTICS_DB.minPoolSize* 0
ConnectionPool size User Statistics - maxUSER_SESSION_STATISTICS_DB.maxPoolSize*10
ConnectionPool size Authorization - minAUTH_DB.minPoolSize* 0
ConnectionPool size Authorization - maxAUTH_DB.maxPoolSize*10
ConnectionPool size Locking - minLOCKING_DB.minPoolSize* 0
ConnectionPool size Locking - maxLOCKING_DB.maxPoolSize*5
ConnectionPool size LEADERSHIP_DB - minLEADERSHIP_DB.minPoolSize* 0
ConnectionPool size LEADERSHIP_DB - maxLEADERSHIP_DB.maxPoolSize*5

*or the default configuration of DEFAULT_DB - the sum of the maximum connections of each connection pool is the required number of connections from the database

Conversion Job database:

RequirementsDescription
Version11g or newer
ConnectionPool size min10
ConnectionPool size max150

If CONV_STORAGE_TYPE=db then the conversion files will be stored in database. This needs to be considered in case of sizing/caching configuration of the database.

SQL Database - TF conversion - MSSQL

System database:

RequirementsParameterDescription
Version---12 or newer
ConnectionPool size User Session - minUSER_SESSIONS_DB.minPoolSize* 0
ConnectionPool size User Session - maxUSER_SESSIONS_DB.maxPoolSize*20
ConnectionPool size User Statistics - minUSER_SESSION_STATISTICS_DB.minPoolSize* 0
ConnectionPool size User Statistics - maxUSER_SESSION_STATISTICS_DB.maxPoolSize*10
ConnectionPool size Authorization - minAUTH_DB.minPoolSize* 0
ConnectionPool size Authorization - maxAUTH_DB.maxPoolSize*10
ConnectionPool size Locking - minLOCKING_DB.minPoolSize* 0
ConnectionPool size Locking - maxLOCKING_DB.maxPoolSize*5
ConnectionPool size LEADERSHIP_DB - minLEADERSHIP_DB.minPoolSize* 0
ConnectionPool size LEADERSHIP_DB - maxLEADERSHIP_DB.maxPoolSize*5

*or the default configuration of DEFAULT_DB - the sum of the maximum connections of each connection pool is the required number of connections from the database

Conversion Job database:

RequirementsDescription
Version12 or newer
ConnectionPool size min10
ConnectionPool size max150

If CONV_STORAGE_TYPE=db then the conversion files will be stored in database. This needs to be considered in case of sizing/caching configuration of the database.