This deck will present high level Apache SystemML design and architecture containing language, compiler and runtime modules. It will describe how compilation chain gets generated and variable analysis done. It will show HOPs and runtime plan for sample use case. It will show how to get statistics, and some diagnostic tools can be used.
5. SystemML Design
5
DML (Declarative Machine
Learning Language)
Hadoop or Spark Cluster
(scale-out)
since 2010
In-Memory Single Node
(scale-up)
since 2012 since 2015
DML Scripts
Data
CP + b sb _mVar1
SPARK mapmm X _mvar1 _mVar2
RIGHT false NONE
CP * y _mVar2 _mVar3
Hybrid execution
plans*
SystemML3. double [] []
1. On disk/HDFS
2. RDD/DataFrame
6. SystemML Design
6
Hadoop or Spark Cluster
(scale-out)
since 2010
In-Memory Single Node
(scale-up)
since 2012
DML Scripts
Data
SystemML
1. On disk/HDFS
2. RDD/DataFrame
3. double [] []
Command line API*
(also MLContext*)
-exec hadoop
7. SystemML Design
7
Hadoop or Spark Cluster
(scale-out)
In-Memory Single Node
(scale-up)
since 2012
DML Scripts
Data
SystemML
1. On disk/HDFS
2. RDD/DataFrame
3. double [] []
Two options:
1. –exec singlenode
2. Use standalone jar (preserves rewrites, but
may spawn Local MR jobs)
Command line API*
(also MLContext*)
12. From DML to Execution Plan
12
Hadoop or Spark Cluster
(scale-out)
In-Memory Single Node
(scale-up)
DML Scripts DML (Declarative Machine
Learning Language)
since 2010since 2012 since 2015
Data
CP + b sb _mVar1
SPARK mapmm X _mvar1 _mVar2
RIGHT false NONE
CP * y _mVar2 _mVar3
Hybrid execution
plans*
SystemML
13. From DML to Execution Plan
13
Hadoop or Spark Cluster
(scale-out)
In-Memory Single Node
(scale-up)
Runtime
Compiler
Language
DML Scripts DML (Declarative Machine
Learning Language)
since 2010since 2012 since 2015
Data
CP + b sb _mVar1
SPARK mapmm X _mvar1 _mVar2
RIGHT false NONE
CP * y _mVar2 _mVar3
Hybrid execution
plans*
Assuming an example dataset
X: 100M X 500, y: 100M X 1,
b/sb: 500 X 1
15. SystemML Compilation Chain
15
• Parsing
• Parse input DML/PyDML using Antlr v4 (see Dml.g4 and Pydml.g4)
• Perform syntactic validation
• Construct DMLProgram (=> list of Statement and function blocks)
• Live Variable Analysis
• Classic dataflow analysis
• A variable is “live” if it holds value that may be needed in future
• Dead code elimination
• Semantic Validation
16. SystemML Compilation Chain
16
• Dataflow in DAGs of operations on matrices, frames, and scalars
• Choosing from alternative execution plans based on memory and cost estimates
• Operator ordering & selection; hybrid plans
18. SystemML Compilation Chain
18
• Low-level physical execution plan (LOPDags)
• Over key-value pairs for MR
• Over RDDs for Spark
• “Piggybacking” operations into minimal number Map-Reduce jobs
19. SystemML Compilation Chain
19
Spark
CP + b sb _mVar1
SPARK mapmm X.MATRIX.DOUBLE _mvar1.MATRIX.DOUBLE
_mVar2.MATRIX.DOUBLE RIGHT false NONE
CP * y _mVar2 _mVar3
20. SystemML Runtime
• Hybrid Runtime
• CP: single machine operations & orchestrate jobs
• MR: generic Map-Reduce jobs & operations
• SP: Spark Jobs
• Numerically stable operators
• Dense / sparse matrix representation
• Multi-Level buffer pool (caching) to evict in-memory
objects
• Dynamic Recompilation for initial unknowns
Control Program
Runtime
Program
Buffer Pool
ParFor Optimizer/
Runtime
MR
InstSpark
Inst
CP
Inst
Recompiler
DFS IOMem/FS IO
Generic
MR Jobs
MatrixBlock Library
(single/multi-threaded)
21. From DML to Execution Plan
21
Hadoop or Spark Cluster
(scale-out)
In-Memory Single Node
(scale-up)
Runtime
Compiler
Language
DML Scripts DML (Declarative Machine
Learning Language)
since 2010since 2012 since 2015
Data
CP + b sb _mVar1
SPARK mapmm X_mvar1 _mVar2
RIGHT false NONE
CP * y _mVar2 _mVar3
Hybrid execution
plans*
Varying data sizes
LinearRegression.dml
22. A Data Scientist – Linear Regression
22
X ≈
Explanatory/
Independent Variables
Predicted/
Dependant VariableModel
w
w = argminw ||Xw-y||2 +λ||w||2
Optimization Problem:
next direction
Iterate until
convergence
initialize
step size
update w
initial direction
accuracy
measures
Conjugate GradientMethod:
• Start off with the (negative) gradient
• For each step
1. Move to the optimal point along the chosen direction;
2. Recompute the gradient;
3. Project it onto the subspace conjugate* to allprior directions;
4. Use this as the next direction
(* conjugate =orthogonalgiven A as the metric)
A = XT X + λ
y
23. SystemML – Run LinReg CG on Spark
23
100M
10,000
100M
1
yX
100M
1,000
X
100M
100
X
100M
10
X
100M
1
y
100M
1
y
100M
1
y
8 TB
800 GB
80 GB
8 GB …
tMMp
…
Multithreaded
Single Node
20 GB Driver on 16c
6 x 55 GB Executors
Hybrid Plan
with RDD caching
and fused operator
Hybrid Plan
with RDD out-of-
core and fused
operator
Hybrid Plan
with RDD out-of-
core and different
operators
…
x.persist();
...
X.mapValues(tMMv
)
.reduce ()
…
Driver
Fused
Executors
…
RDD cache: X
tMMv tMMv
…
x.persist();
...
X.mapValues(tMMv)
.reduce()
...
Executors
…
RDD cache: X
tMMv tMMv
Driver
Spilling
…
x.persist();
...
// 2 MxV mult
// with broadcast,
// mapToPair, and
// reduceByKey
... Executors
…
RDD cache: X
Mv
tvM
Mv
tvM
Driver
Driver
Cache
24. Agenda
• Architecture Overview
• Language & APIs
• Compiler
• Runtime
• Two examples:
• Simple DML expression with an example dataset
• Linear Regression with varying datasizes
• Tooling
• Important links
24
26. Explain (Understanding Execution Plans)
• Overview
• Shows generated execution plan (at different compilation steps)
• Introduced 05/2014 for internal usage
• Important tool for understanding/debugging optimizer choices!
• Usage
• hadoop jar SystemML.jar -f test.dml –explain
[hops | runtime | hops_recompile | runtime_recompile]
• Hops
• Program w/ hop dags after optimization
• Runtime (default)
• Program w/ generated runtime instructions
• Hops_recompile:
• See hops + hop dag after every recompile
• Runtime_recompile:
• See runtime + generated runtime instructions after every recompile
26
27. Explain: Understanding HOP DAGs (simple DML)
27
Spark
• HOP ID
• HOP opcode
• HOP input data dependencies (via HOP IDs)
• HOP output matrix characteristics (rlen, clen, brlen, bclen, nnz)
• Hop memory estimates (all inputs, intermediates, output à
operation mem)
• Hop execution type (CP/SP/MR)
• Optional: indicators of reblock/checkpointing (caching) of hop
outputs
-explain hops
-explain recompile_hops
spark-submit --master yarn-client --driver-memory 20G --num-executors 4 --executor-memory 40G --executor-cores 24 SystemML.jar -f test.dml -explain hops
Broadcast mem
budget
28. Explain: Understanding HOP DAGs (entire script)
• Example DML Script (Simplified LinregDS)
28
X = read($1);
y = read($2);
intercept = $3;
lambda = $4;
if( intercept == 1 ) {
ones = matrix(1, nrow(X), 1);
X = append(X, ones);
}
I = matrix(1, ncol(X), 1);
A = t(X) %*% X + diag(I*lambda);
b = t(X) %*% y;
beta = solve(A, b);
write(beta, $5);
Invocation:
hadoop jar SystemML.jar -f
linregds.dml -args X y 0 0 beta
Scenario:
X: 100,000 x 1,000, 1.0
y: 100,000 x 1, 1.0
(800MB, 200+GFlop)
29. Explain: Understanding HOP DAGs (2)
• Explain Hops
29
15/07/05 17:18:06 INFO api.DMLScript: EXPLAIN (HOPS):
# Memory Budget local/remote = 57344MB/1434MB/1434MB
# Degree of Parallelism (vcores) local/remote = 24/144/72
PROGRAM
--MAIN PROGRAM
----GENERIC (lines 1-4) [recompile=false]
------(10) PRead X [100000,1000,1000,1000,100000000] [0,0,763 -> 763MB], CP
------(11) TWrite X (10) [100000,1000,1000,1000,100000000] [763,0,0 -> 763MB], CP
------(21) PRead y [100000,1,1000,1000,100000] [0,0,1 -> 1MB], CP
------(22) TWrite y (21) [100000,1,1000,1000,100000] [1,0,0 -> 1MB], CP
------(24) TWrite intercept [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
------(26) TWrite lambda [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
----GENERIC (lines 11-16) [recompile=false]
------(42) TRead X [100000,1000,1000,1000,100000000] [0,0,763 -> 763MB], CP
------(52) r(t) (42) [1000,100000,1000,1000,100000000] [763,0,763 -> 1526MB]
------(53) ba(+*) (52,42) [1000,1000,1000,1000,-1] [1526,8,8 -> 1541MB], CP
------(43) TRead y [100000,1,1000,1000,100000] [0,0,1 -> 1MB], CP
------(59) ba(+*) (52,43) [1000,1,1000,1000,-1] [764,0,0 -> 764MB], CP
------(60) b(solve) (53,59) [1000,1,1000,1000,-1] [8,8,0 -> 15MB], CP
------(66) PWrite beta (60) [1000,1,-1,-1,-1] [0,0,0 -> 0MB], CP
Cluster
Characteristics
Program Structure
(incl recompile)
Unrolled
HOP
DAG
Notes: if branch (6-9) and regularization removed by rewrites
34. Agenda
• Architecture Overview
• Language & APIs
• Compiler
• Runtime
• Two examples:
• Simple DML expression with an example dataset
• Linear Regression with varying datasizes
• Tooling
• Important links
34
37. Important Links
• Website: http://systemml.apache.org/
• Interested in SystemML ?
• Go to https://github.com/apache/incubator-systemml and “Star it”
• Want to contribute to SystemML ?
• See http://apache.github.io/incubator-systemml/contributing-to-
systemml.html
• List of issues: https://issues.apache.org/jira/browse/SYSTEMML/
• Ask any of our PMC members for suggestions
• Want to try out SystemML ?
• Laptop: http://apache.github.io/incubator-systemml/quick-start-guide.html
(Does not require Hadoop/Spark installation)
• Spark Cluster: http://apache.github.io/incubator-systemml/spark-
mlcontext-programming-guide.html (Includes Jupyter/Zeppelin demo)
37