Apache Apex Development Environment Setup
This document discusses the steps needed for setting up a development environment for creating applications that run on the Apache Apex or the DataTorrent RTS streaming platform.
Microsoft Windows
There are a few tools that will be helpful when developing Apache Apex applications, some required and some optional:
-
git -- A revision control system (version 1.7.1 or later). There are multiple git clients available for Windows (http://git-scm.com/download/win for example), so download and install a client of your choice.
-
java JDK (not JRE). Includes the Java Runtime Environment as well as the Java compiler and a variety of tools (version 1.7.0_79 or later). Can be downloaded from the Oracle website.
-
maven -- Apache Maven is a build system for Java projects (version 3.0.5 or later). It can be downloaded from https://maven.apache.org/download.cgi.
-
VirtualBox -- Oracle VirtualBox is a virtual machine manager (version 4.3 or later) and can be downloaded from https://www.virtualbox.org/wiki/Downloads. It is needed to run the DataTorrent Sandbox.
-
DataTorrent Sandbox -- The sandbox can be downloaded from https://www.datatorrent.com/download. It is useful for testing simple applications since it contains Apache Hadoop and DataTorrent RTS pre-installed with a time-limited Enterprise License. If you already installed the RTS Enterprise Edition (evaluation or production license) on a cluster, you can use that setup for deployment and testing instead of the sandbox.
-
(Optional) If you prefer to use an IDE (Integrated Development Environment) such as NetBeans, Eclipse or IntelliJ, install that as well.
After installing these tools, make sure that the directories containing the executable files are in your PATH environment; for example, for the JDK executables like java and javac, the directory might be something like C:\Program Files\Java\jdk1.7.0\_80\bin
; for git it might be C:\Program Files\Git\bin
; and for maven it might be C:\Users\user\Software\apache-maven-3.3.3\bin
. Open a console window and enter the command:
echo %PATH%
to see the value of the PATH
variable and verify that the above directories are present. If not, you can change its value clicking on the button at Control Panel ⇨ Advanced System Settings ⇨ Advanced tab ⇨ Environment Variables.
Now run the following commands and ensure that the output is something similar to that shown in the table below:
Command |
Output |
javac -version |
javac 1.7.0_80 |
java -version |
java version "1.7.0_80" Java(TM) SE Runtime Environment (build 1.7.0_80-b15) Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode) |
git --version |
git version 2.6.1.windows.1 |
mvn --version |
Apache Maven 3.3.3 (7994120775791599e205a5524ec3e0dfe41d4a06; 2015-04-22T06:57:37-05:00) Maven home: C:\Users\user\Software\apache-maven-3.3.3\bin\.. Java version: 1.7.0_80, vendor: Oracle Corporation Java home: C:\Program Files\Java\jdk1.7.0_80\jre Default locale: en_US, platform encoding: Cp1252 OS name: "windows 8", version: "6.2", arch: "amd64", family: "windows" |
Installing the Sandbox
The sandbox includes, as noted above, a complete, stand-alone, instance of the Datatorrent RTS Enterprise Edition configured as a single-node Hadoop cluster. Please see DataTorrent RTS Sandbox for details on setting up the sandbox.
You can choose to develop either directly on the sandbox or on your development machine. The advantage of the former is that most of the tools (e.g. jdk, git, maven) are pre-installed and also the package files created by your project are directly available to the DataTorrent tools such as dtManage and Apex CLI. The disadvantage is that the sandbox is a memory-limited environment so running a memory-hungry tool like a Java IDE on it may starve other applications of memory.
Creating a new Project
You can now use the maven archetype to create a basic Apache Apex project as follows: Put these lines in a Windows command file called, for example, newapp.cmd
and run it (the
value for archetypeVersion
can be a more recent version if available):
@echo off
@rem Script for creating a new application
setlocal
mvn -B archetype:generate ^
-DarchetypeGroupId=org.apache.apex ^
-DarchetypeArtifactId=apex-app-archetype ^
-DarchetypeVersion=3.6.0-SNAPSHOT ^
-DgroupId=com.example ^
-Dpackage=com.example.myapexapp ^
-DartifactId=myapexapp ^
-Dversion=1.0-SNAPSHOT
endlocal
The caret (^) at the end of some lines indicates that a continuation line follows.
This command will eventually become outdated, check the apex-app-archetype README for the latest version, which includes the most recent archetypeVersion.
You can also, if you prefer, use an IDE to generate the project as described in Creating a New Apache Apex Project with your IDE.
When the run completes successfully, you should see a new directory named myapexapp
containing a maven project for building a basic Apache Apex application. It includes 3 source files:Application.java, RandomNumberGenerator.java and ApplicationTest.java. You can now build the application by stepping into the new directory and running the appropriate maven command:
cd myapexapp
mvn clean package -DskipTests
The build should create the application package file myapexapp\target\myapexapp-1.0-SNAPSHOT.apa
. This file can then be uploaded to the DataTorrent GUI tool (dtManage) on your cluster if you have DataTorrent RTS installed there, or on the sandbox and launched from there. It generates a stream of random numbers and prints them out, each prefixed by the string hello world:
.
If you built this package on the host, you can transfer it to the sandbox using either a shared folder or the pscp
tool bundled with PuTTY
mentioned earlier.
You can also run this application from the generated unit test file as described in the next section.
Running Unit Tests
To run unit tests on Linux or OSX, simply run the usual maven command, for example: mvn test
to run all tests or mvn -Dcom.example.myapexapp.ApplicationTest#testApplication test
to run
a selected test. For the default application generated from the archetype, it should
print output like this:
-------------------------------------------------------
TESTS
-------------------------------------------------------
Running com.example.mydtapp.ApplicationTest
hello world: 0.8015370953286478
hello world: 0.9785359225545481
hello world: 0.6322611586644047
hello world: 0.8460953663451775
hello world: 0.5719372906929072
...
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.863
sec
Results :
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
On Windows, an additional file, winutils.exe
, is required; download it from
https://github.com/srccodes/hadoop-common-2.2.0-bin/archive/master.zip
and unpack the archive to, say, C:\hadoop
; this file should be present under
hadoop-common-2.2.0-bin-master\bin
within it.
Set the HADOOP_HOME
environment variable system-wide to
c:\hadoop\hadoop-common-2.2.0-bin-master
as described at:
https://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/sysdm_advancd_environmnt_addchange_variable.mspx?mfr=true. You should now be able to run unit tests normally.
If you prefer not to set the variable globally, you can set it on the command line or within
your IDE. For example, on the command line, specify the maven
property hadoop.home.dir
:
mvn -Dhadoop.home.dir=c:\hadoop\hadoop-common-2.2.0-bin-master test
or set the environment variable separately:
set HADOOP_HOME=c:\hadoop\hadoop-common-2.2.0-bin-master
mvn test
Within your IDE, set the environment variable and then run the desired unit test in the usual way. For example, with NetBeans you can add:
Env.HADOOP_HOME=c:/hadoop/hadoop-common-2.2.0-bin-master
at Properties ⇨ Actions ⇨ Run project ⇨ Set Properties.
Similarly, in Eclipse (Mars) add it to the project properties at Properties ⇨ Run/Debug Settings ⇨ ApplicationTest ⇨ Environment tab.
Building the Sources
The Apache Apex source code is useful to have locally for a variety of reasons:
- It has more substantial demo applications which can serve as models for new
applications in the same or similar domain.
- When extending a class, it is helpful to refer to the base class implementation
of overrideable methods.
- The maven build file pom.xml
can be a useful model when you need to add or delete
plugins, dependencies, profiles, etc.
- Browsing the code is a good way to gain a deeper understanding of the platform.
You can download and build the source repositories by running the script build-apex.cmd
located in the same place in the examples repository described above. Alternatively, if
you do not want to use the script, you can follow these simple manual steps:
-
Check out the source code repositories:
git clone https://github.com/apache/incubator-apex-core git clone https://github.com/apache/incubator-apex-malhar
-
Switch to the appropriate release branch and build each repository:
pushd incubator-apex-core mvn clean install -DskipTests popd pushd incubator-apex-malhar mvn clean install -DskipTests popd
The install
argument to the mvn
command installs resources from each project to your
local maven repository (typically .m2/repository
under your home directory), and
not to the system directories, so Administrator privileges are not required.
The -DskipTests
argument skips running unit tests since they take a long time. If this is a first-time installation, it might take several minutes to complete because maven will download a number of associated plugins.
After the build completes, you should see application package files in the target
directories under each module; for example, the Pi Demo
package file
pi-demo-3.2.1-incubating-SNAPSHOT.apa
should be under
incubator-apex-malhar\demos\pi\target
.
Linux
Most of the instructions for Linux (and other Unix-like systems) are similar to those for Windows described above, so we will just note the differences.
The pre-requisites (such as git, maven, etc.) are the same as for Windows described above; please run the commands in the table and ensure that appropriate versions are present in your PATH environment variable (the command to display that variable is: echo $PATH
).
The maven archetype command is the same except that continuation lines use a backslash (\
) instead of caret (^
); the script for it is available in the same location and is named newapp
(without the .cmd
extension). The script to checkout and build the Apache Apex repositories is named build-apex
.