Apache Hadoop - Apache Hadoop is a programming framework that supports the processing of large data sets in a distributed computing environment.
Application - unified batch and real-time stream processing application running on Apache Apex platform.
Container - A physical resource with CPU and memory constraints allocated by YARN’s Resource Manager.
Current Window Id - Sequentially increasing identifier for a specific computation time period within Apache Apex platform.
DAG - Directed Acyclic Graph, formed by a collection of vertices and directed edges without cycles. Sometimes interchangeably used to describe Apache Apex applications, specifically referring to their Logical / Physical plans, composed of operators connected by streams.
Data Tuples Emitted - Number of data objects emitted by the operators with an output port.
Data Tuples Processed - Number of data objects processed by the operators in an Apache Apex application.
Logical Plan - Logical representation of an Apache Apex application, where the computational nodes are called Operators and the data-flow edges are called Streams.
Operator - An entity that holds a computational logic to process the data tuples. It is part of a real-time stream processing application. The Operator computational logic gets executed inside a YARN Container.
Physical Operator - Physical representation of the operator, which contains information such as the name of container and the Hadoop node where operator instance is running.
Physical Plan - Physical representation of the Logical Plan of the application and is a blueprint of how the application will run inside YARN containers deployed across nodes in a Hadoop cluster.
Port - Each operator can have ports on which it can receive input data tuples and also output processed data tuples.
Recovery Window Id - Identifier for the last computational window at which the operator state was check-pointed into HFDS.
Resource Manager - YARN component that allocates and arbitrates the resources such as CPU, Memory and Network.
STRAM - Streaming Application Manager is the first process that is activated upon application launch and orchestrates the deployment, management, and monitoring of the Apache Apex applications throughout their lifecycle.
Stream - A stream consists of data tuples that flow from one port of an operator to another.
YARN - Apache Hadoop YARN (Yet Another Resource Negotiator) is a cluster resource management technology, introduced with Hadoop 2.0.
DataTorrent RTS
dtAssemble - graphical application assembly tool used to develop applications.
dtDashboard - graphical visualization tool to view and query system and application data.
dtGateway - HTTP server used by dtManage to interact with STRAM, YARN Resource Manager, and HDFS
dtManage - the web based interface to install, configure, manage & monitor Apache Apex applications running in a Hadoop Cluster