SlideShare a Scribd company logo
1 of 38
Download to read offline
SystemML	Architecture
Niketan	Pansare,	Berthold	Reinwald
July	25th,	2016
Agenda
• High-level	Design	&	APIs
• Architecture	Overview
• Tooling
• Important	links
2
From	http://systemml.apache.org/
Agenda
• High-level	Design	&	APIs	
• Architecture	Overview
• Language
• Compiler
• Runtime
• Two	examples:
• Simple	DML	expression	with	an	example	dataset
• Linear	Regression	with	varying	datasizes
• Tooling
• Important	links
3
Agenda
• High-level	Design	&	APIs	
• Architecture	Overview
• Language
• Compiler
• Runtime
• Two	examples:
• Simple	DML	expression	with	an	example	dataset
• Linear	Regression	with	varying	datasizes
• Tooling
• Important	links
4
SystemML Design
5
DML (Declarative Machine
Learning Language)
Hadoop or Spark Cluster
(scale-out)
since 2010
In-Memory Single Node
(scale-up)
since 2012 since 2015
DML Scripts
Data
CP	+	b	sb	_mVar1
SPARK	mapmm	X	_mvar1	_mVar2	
RIGHT	false	NONE
CP	*	y	_mVar2	_mVar3
Hybrid	execution	
plans*
SystemML3.	double	[]	[]
1.	On	disk/HDFS
2.	RDD/DataFrame
SystemML Design
6
Hadoop or Spark Cluster
(scale-out)
since 2010
In-Memory Single Node
(scale-up)
since 2012
DML Scripts
Data
SystemML
1.	On	disk/HDFS
2.	RDD/DataFrame
3.	double	[]	[]
Command line API*
(also MLContext*)
-exec	hadoop
SystemML Design
7
Hadoop or Spark Cluster
(scale-out)
In-Memory Single Node
(scale-up)
since 2012
DML Scripts
Data
SystemML
1.	On	disk/HDFS
2.	RDD/DataFrame
3.	double	[]	[]
Two	options:
1. –exec	singlenode
2. Use	standalone	jar	(preserves	rewrites,	but	
may	spawn	Local	MR	jobs)
Command line API*
(also MLContext*)
SystemML Design
8
Spark Cluster
(scale-out)
In-Memory Single Node
(scale-up)
since 2012 since 2015
DML Scripts
Data
SystemML
1.	On	disk/HDFS
2.	RDD/DataFrame
3.	double	[]	[]
Command line API*
(also MLContext*)
SystemML Design
9
Spark Cluster
(scale-out)
In-Memory Single Node
(scale-up)
since 2012 since 2015
DML Scripts
Data
SystemML
1.	On	disk/HDFS
2.	RDD/DataFrame
3.	double	[]	[]
MLContext API
- Java/Python/Scala
https://apache.github.io/incubator-systemml/spark-mlcontext-programming-guide.html
SystemML Design
10
In-Memory Single Node
(scale-up)
since 2012
DML Scripts
Data
SystemML
1.	On	disk/HDFS
2.	RDD/DataFrame
3.	double	[]	[]
JMLC API
https://apache.github.io/incubator-systemml/jmlc.html
Agenda
• High-level	Design	&	APIs	
• Architecture	Overview
• Language
• Compiler
• Runtime
• Two	examples:
• Simple	DML	expression	with	an	example	dataset
• Linear	Regression	with	varying	datasizes
• Tooling
• Important	links
11
From DML to Execution Plan
12
Hadoop or Spark Cluster
(scale-out)
In-Memory Single Node
(scale-up)
DML Scripts DML (Declarative Machine
Learning Language)
since 2010since 2012 since 2015
Data
CP	+	b	sb	_mVar1
SPARK	mapmm	X	_mvar1	_mVar2	
RIGHT	false	NONE
CP	*	y	_mVar2	_mVar3
Hybrid	execution	
plans*
SystemML
From DML to Execution Plan
13
Hadoop or Spark Cluster
(scale-out)
In-Memory Single Node
(scale-up)
Runtime
Compiler
Language
DML Scripts DML (Declarative Machine
Learning Language)
since 2010since 2012 since 2015
Data
CP	+	b	sb	_mVar1
SPARK	mapmm	X	_mvar1	_mVar2	
RIGHT	false	NONE
CP	*	y	_mVar2	_mVar3
Hybrid	execution	
plans*
Assuming	 an	example	dataset	
X:	100M	X	500,	y:	100M	X	1,	
b/sb:	500	X	1
SystemML Compilation Chain
14
SystemML Compilation Chain
15
• Parsing
• Parse input DML/PyDML using Antlr v4 (see Dml.g4 and Pydml.g4)
• Perform syntactic validation
• Construct DMLProgram (=> list of Statement and function blocks)
• Live Variable Analysis
• Classic dataflow analysis
• A variable is “live” if it holds value that may be needed in future
• Dead code elimination
• Semantic Validation
SystemML Compilation Chain
16
• Dataflow in DAGs of operations on matrices, frames, and scalars
• Choosing from alternative execution plans based on memory and cost estimates
• Operator ordering & selection; hybrid plans
SystemML Compilation Chain
17
*	Discussed	later	in	Tooling
spark-submit	--master	yarn-client	 --driver-memory	20G	--num-executors	 4	--executor-memory	 40G	--executor-cores	 24	SystemML.jar	-f	test.dml	-explain	 hops
SystemML Compilation Chain
18
• Low-level physical execution plan (LOPDags)
• Over key-value pairs for MR
• Over RDDs for Spark
• “Piggybacking” operations into minimal number Map-Reduce jobs
SystemML Compilation Chain
19
Spark
CP + b sb _mVar1
SPARK mapmm X.MATRIX.DOUBLE _mvar1.MATRIX.DOUBLE
_mVar2.MATRIX.DOUBLE RIGHT false NONE
CP * y _mVar2 _mVar3
SystemML Runtime
• Hybrid Runtime
• CP: single machine operations & orchestrate jobs
• MR: generic Map-Reduce jobs & operations
• SP: Spark Jobs
• Numerically stable operators
• Dense / sparse matrix representation
• Multi-Level buffer pool (caching) to evict in-memory
objects
• Dynamic Recompilation for initial unknowns
Control	 Program
Runtime
Program
Buffer	Pool
ParFor Optimizer/
Runtime
MR
InstSpark	
Inst
CP
Inst
Recompiler
DFS	IOMem/FS	IO
Generic
MR	Jobs
MatrixBlock Library
(single/multi-threaded)
From DML to Execution Plan
21
Hadoop or Spark Cluster
(scale-out)
In-Memory Single Node
(scale-up)
Runtime
Compiler
Language
DML Scripts DML (Declarative Machine
Learning Language)
since 2010since 2012 since 2015
Data
CP	+	b	sb	_mVar1
SPARK	mapmm	X_mvar1	_mVar2	
RIGHT	false	NONE
CP	*	y	_mVar2	_mVar3
Hybrid	execution	
plans*
Varying	data	sizes
LinearRegression.dml
A	Data	Scientist	– Linear	Regression
22
X ≈
Explanatory/
Independent Variables
Predicted/
Dependant VariableModel
w
w = argminw ||Xw-y||2 +λ||w||2
Optimization Problem:
next	direction
Iterate	until	
convergence
initialize
step	size
update		w
initial	direction
accuracy
measures
Conjugate GradientMethod:
• Start off with the (negative) gradient
• For each step
1. Move to the optimal point along the chosen direction;
2. Recompute the gradient;
3. Project it onto the subspace conjugate* to allprior directions;
4. Use this as the next direction
(* conjugate =orthogonalgiven A as the metric)
A = XT X + λ
y
SystemML	– Run	LinReg	CG	on	Spark
23
100M
10,000
100M
1
yX
100M
1,000
X
100M
100
X
100M
10
X
100M
1
y
100M
1
y
100M
1
y
8 TB
800 GB
80 GB
8 GB …
tMMp
…
Multithreaded
Single Node
20 GB Driver on 16c
6 x 55 GB Executors
Hybrid Plan
with RDD caching
and fused operator
Hybrid Plan
with RDD out-of-
core and fused
operator
Hybrid Plan
with RDD out-of-
core and different
operators
…
x.persist();
...
X.mapValues(tMMv
)
.reduce ()
…
Driver
Fused
Executors
…
RDD	cache:	X
tMMv tMMv
…
x.persist();
...
X.mapValues(tMMv)
.reduce()
...
Executors
…
RDD	cache:	X
tMMv tMMv
Driver
Spilling
…
x.persist();
...
// 2 MxV mult
// with broadcast,
// mapToPair, and
// reduceByKey
... Executors
…
RDD	cache:	X
Mv
tvM
Mv
tvM
Driver
Driver
Cache
Agenda
• Architecture	Overview
• Language	&	APIs
• Compiler
• Runtime
• Two	examples:
• Simple	DML	expression	with	an	example	dataset
• Linear	Regression	with	varying	datasizes
• Tooling
• Important	links
24
SystemML’s	Compilation	Chain	/	Overview	Tools
25
EXPLAIN
hops
STATS
DEBUG
EXPLAIN
runtime
[Matthias	Boehm	et	al:
SystemML's	Optimizer:	Plan	
Generation	for	Large-Scale	
Machine	Learning	Programs.	IEEE	
Data	Eng.	Bull	2014]
HOP	(High-level	operator)
LOP	(Low-level	operator)	
EXPLAIN
*_recompile
Explain	(Understanding	Execution	Plans)
• Overview
• Shows	generated	execution	plan	(at	different	compilation	steps)	
• Introduced	05/2014	for	internal	usage
• Important	tool	for	understanding/debugging	 optimizer	choices!
• Usage
• hadoop jar SystemML.jar -f test.dml –explain
[hops | runtime | hops_recompile | runtime_recompile]
• Hops
• Program	w/	hop	dags	after	optimization
• Runtime	(default)
• Program	w/	generated	runtime	instructions
• Hops_recompile:	
• See	hops	+	hop	dag	after	every	recompile
• Runtime_recompile:	
• See	runtime	+	generated	runtime	instructions	after	every	recompile
26
Explain:	Understanding	HOP	DAGs	(simple		DML)
27
Spark
• HOP	ID
• HOP	opcode
• HOP	input	data	dependencies	(via	HOP	IDs)
• HOP	output	matrix	characteristics	(rlen,	clen,	brlen,	bclen,	nnz)
• Hop	memory	estimates	(all	inputs,	 intermediates,	output	à
operation	mem)
• Hop	execution	type	(CP/SP/MR)
• Optional:	indicators	of	reblock/checkpointing	 (caching)	of	hop	
outputs
-explain	hops
-explain	recompile_hops
spark-submit	--master	yarn-client	 --driver-memory	20G	--num-executors	 4	--executor-memory	 40G	--executor-cores	 24	SystemML.jar	-f	test.dml	-explain	 hops
Broadcast	mem	
budget
Explain:	Understanding	HOP	DAGs	(entire	script)
• Example	DML	Script	(Simplified	LinregDS)
28
X = read($1);
y = read($2);
intercept = $3;
lambda = $4;
if( intercept == 1 ) {
ones = matrix(1, nrow(X), 1);
X = append(X, ones);
}
I = matrix(1, ncol(X), 1);
A = t(X) %*% X + diag(I*lambda);
b = t(X) %*% y;
beta = solve(A, b);
write(beta, $5);
Invocation:	
hadoop jar SystemML.jar -f
linregds.dml -args X y 0 0 beta
Scenario:
X:	100,000	x	1,000,	1.0
y:	100,000	x	1,	1.0	
(800MB,	200+GFlop)
Explain:	Understanding	HOP	DAGs	(2)
• Explain	Hops
29
15/07/05 17:18:06 INFO api.DMLScript: EXPLAIN (HOPS):
# Memory Budget local/remote = 57344MB/1434MB/1434MB
# Degree of Parallelism (vcores) local/remote = 24/144/72
PROGRAM
--MAIN PROGRAM
----GENERIC (lines 1-4) [recompile=false]
------(10) PRead X [100000,1000,1000,1000,100000000] [0,0,763 -> 763MB], CP
------(11) TWrite X (10) [100000,1000,1000,1000,100000000] [763,0,0 -> 763MB], CP
------(21) PRead y [100000,1,1000,1000,100000] [0,0,1 -> 1MB], CP
------(22) TWrite y (21) [100000,1,1000,1000,100000] [1,0,0 -> 1MB], CP
------(24) TWrite intercept [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
------(26) TWrite lambda [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
----GENERIC (lines 11-16) [recompile=false]
------(42) TRead X [100000,1000,1000,1000,100000000] [0,0,763 -> 763MB], CP
------(52) r(t) (42) [1000,100000,1000,1000,100000000] [763,0,763 -> 1526MB]
------(53) ba(+*) (52,42) [1000,1000,1000,1000,-1] [1526,8,8 -> 1541MB], CP
------(43) TRead y [100000,1,1000,1000,100000] [0,0,1 -> 1MB], CP
------(59) ba(+*) (52,43) [1000,1,1000,1000,-1] [764,0,0 -> 764MB], CP
------(60) b(solve) (53,59) [1000,1,1000,1000,-1] [8,8,0 -> 15MB], CP
------(66) PWrite beta (60) [1000,1,-1,-1,-1] [0,0,0 -> 0MB], CP
Cluster
Characteristics
Program	Structure	
(incl	recompile)
Unrolled	
HOP	
DAG
Notes:	if	branch	(6-9)	and	regularization	removed	by	rewrites
Explain:	Understanding	Runtime	Plans	(1)
• Explain	Runtime	(simplified	filenames,	removed	rmvar)
30 IBM	Research
15/07/05 17:18:53 INFO api.DMLScript: EXPLAIN (RUNTIME):
# Memory Budget local/remote = 57344MB/1434MB/1434MB
# Degree of Parallelism (vcores) local/remote = 24/144/72
PROGRAM ( size CP/MR = 25/0 )
--MAIN PROGRAM
----GENERIC (lines 1-4) [recompile=false]
------CP createvar pREADX X false binaryblock 100000 1000 1000 1000 100000000
------CP createvar pREADy y false binaryblock 100000 1 1000 1000 100000
------CP assignvar 0.SCALAR.INT.true intercept.SCALAR.INT
------CP assignvar 0.0.SCALAR.DOUBLE.true lambda.SCALAR.DOUBLE
------CP cpvar pREADX X
------CP cpvar pREADy y
----GENERIC (lines 11-16) [recompile=false]
------CP createvar _mVar2 .../_t0/temp1 true binaryblock 1000 1000 1000 1000 -1
------CP tsmm X.MATRIX.DOUBLE _mVar2.MATRIX.DOUBLE LEFT 24
------CP createvar _mVar3 .../_t0/temp2 true binaryblock 1 100000 1000 1000 100000
------CP r' y.MATRIX.DOUBLE _mVar3.MATRIX.DOUBLE
------CP createvar _mVar4 .../_t0/temp3 true binaryblock 1 1000 1000 1000 -1
------CP ba+* _mVar3.MATRIX.DOUBLE X.MATRIX.DOUBLE _mVar4.MATRIX.DOUBLE 24
------CP createvar _mVar5 .../_t0/temp4 true binaryblock 1000 1 1000 1000 -1
------CP r' _mVar4.MATRIX.DOUBLE _mVar5.MATRIX.DOUBLE
------CP createvar _mVar6 .../_t0/temp5 true binaryblock 1000 1 1000 1000 -1
------CP solve _mVar2.MATRIX.DOUBLE _mVar5.MATRIX.DOUBLE _mVar6.MATRIX.DOUBLE
------CP write _mVar6.MATRIX.DOUBLE .../beta.SCALAR.STRING.true textcell.SCALAR.STRING.true
Literally	a	string	
representation	of	
runtime	instructions
Stats	(Profiling	Runtime	Statistics)
• Overview
• Profiles	and	shows	aggregated	runtime	statistics	of	potential	bottlenecks
• Introduced	01/2014	for	internal	usage,	extension	of	buffer	pool	stats	01/2013
• Important	tool	for	understanding	runtime	characteristics	and	profiling/tuning	
system	internals	by	developers
• Usage
• hadoop jar SystemML.jar -f test.dml -stats
31 IBM	Research
SystemML Statistics
Total	exec	time
Buffer	pool	stats	
Dynamic	recompilation	stats
JVM	stats	(JIT,	GC)
Heavy	hitter	instructions
(incl.	buffer	pool	times)
optional:	parfor	stats	
(if	program	contains	parfors)
Debug	(Script	Debugging)
• Overview
• Script-level	debugging	by	end-users	(and	developers)
• Introduced	09/2014	as	result	of	intern	project
• gdb-inspired	command-line	debugger	interface
• Usage
• hadoop jar SystemML.jar -f test.dml -debug
33
Agenda
• Architecture	Overview
• Language	&	APIs
• Compiler
• Runtime
• Two	examples:
• Simple	DML	expression	with	an	example	dataset
• Linear	Regression	with	varying	datasizes
• Tooling
• Important	links
34
Important Links
• Website:	http://systemml.apache.org/
35
Important Links
• Website:	http://systemml.apache.org/
• Interested	in	SystemML	?
• Go	to	https://github.com/apache/incubator-systemml and	“Star	it”
36
Important Links
• Website: http://systemml.apache.org/
• Interested in SystemML ?
• Go to https://github.com/apache/incubator-systemml and “Star it”
• Want to contribute to SystemML ?
• See http://apache.github.io/incubator-systemml/contributing-to-
systemml.html
• List of issues: https://issues.apache.org/jira/browse/SYSTEMML/
• Ask any of our PMC members for suggestions
• Want to try out SystemML ?
• Laptop: http://apache.github.io/incubator-systemml/quick-start-guide.html
(Does not require Hadoop/Spark installation)
• Spark Cluster: http://apache.github.io/incubator-systemml/spark-
mlcontext-programming-guide.html (Includes Jupyter/Zeppelin demo)
37
Thank	You

More Related Content

What's hot

Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink - Jonathan ...
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink -  Jonathan ...Flink Forward San Francisco 2019: The Trade Desk's Year in Flink -  Jonathan ...
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink - Jonathan ...Flink Forward
 
Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Apache Apex
 
Processors in a nutshell
Processors in a nutshellProcessors in a nutshell
Processors in a nutshellIkuru Kanuma
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareActionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareApache Apex
 
Fault Tolerance and Processing Semantics in Apache Apex
Fault Tolerance and Processing Semantics in Apache ApexFault Tolerance and Processing Semantics in Apache Apex
Fault Tolerance and Processing Semantics in Apache ApexApache Apex Organizer
 
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Apache Apex
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingApache Apex
 
Smart Partitioning with Apache Apex (Webinar)
Smart Partitioning with Apache Apex (Webinar)Smart Partitioning with Apache Apex (Webinar)
Smart Partitioning with Apache Apex (Webinar)Apache Apex
 
Stream data from Apache Kafka for processing with Apache Apex
Stream data from Apache Kafka for processing with Apache ApexStream data from Apache Kafka for processing with Apache Apex
Stream data from Apache Kafka for processing with Apache ApexApache Apex
 
HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...
HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...
HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...Michael Stack
 
University program - writing an apache apex application
University program  - writing an apache apex applicationUniversity program  - writing an apache apex application
University program - writing an apache apex applicationAkshay Gore
 
Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and ApplicationsApache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and ApplicationsThomas Weise
 
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationBuilding Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationApache Apex
 
DataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application MeetupDataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application MeetupThomas Weise
 

What's hot (20)

Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
 
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink - Jonathan ...
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink -  Jonathan ...Flink Forward San Francisco 2019: The Trade Desk's Year in Flink -  Jonathan ...
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink - Jonathan ...
 
Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)
 
Processors in a nutshell
Processors in a nutshellProcessors in a nutshell
Processors in a nutshell
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareActionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
 
Fault Tolerance and Processing Semantics in Apache Apex
Fault Tolerance and Processing Semantics in Apache ApexFault Tolerance and Processing Semantics in Apache Apex
Fault Tolerance and Processing Semantics in Apache Apex
 
Hadoop - Apache Pig
Hadoop - Apache PigHadoop - Apache Pig
Hadoop - Apache Pig
 
Adam
AdamAdam
Adam
 
Parallel concepts1
Parallel concepts1Parallel concepts1
Parallel concepts1
 
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark Streaming
 
Smart Partitioning with Apache Apex (Webinar)
Smart Partitioning with Apache Apex (Webinar)Smart Partitioning with Apache Apex (Webinar)
Smart Partitioning with Apache Apex (Webinar)
 
Stream data from Apache Kafka for processing with Apache Apex
Stream data from Apache Kafka for processing with Apache ApexStream data from Apache Kafka for processing with Apache Apex
Stream data from Apache Kafka for processing with Apache Apex
 
HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...
HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...
HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...
 
Callgraph analysis
Callgraph analysisCallgraph analysis
Callgraph analysis
 
University program - writing an apache apex application
University program  - writing an apache apex applicationUniversity program  - writing an apache apex application
University program - writing an apache apex application
 
Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and ApplicationsApache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications
 
Apex as yarn application
Apex as yarn applicationApex as yarn application
Apex as yarn application
 
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationBuilding Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
 
DataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application MeetupDataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application Meetup
 

Viewers also liked

Classification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj SenClassification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj SenArvind Surve
 
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by  Alexandre V EvfimievskiClustering and Factorization using Apache SystemML by  Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by Alexandre V EvfimievskiArvind Surve
 
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...Arvind Surve
 
Clustering and Factorization using Apache SystemML by Prithviraj Sen
Clustering and Factorization using Apache SystemML by  Prithviraj SenClustering and Factorization using Apache SystemML by  Prithviraj Sen
Clustering and Factorization using Apache SystemML by Prithviraj SenArvind Surve
 
Regression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V EvfimievskiRegression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V EvfimievskiArvind Surve
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalArvind Surve
 
Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...Arvind Surve
 
S1 DML Syntax and Invocation
S1 DML Syntax and InvocationS1 DML Syntax and Invocation
S1 DML Syntax and InvocationArvind Surve
 
Amia tb-review-11
Amia tb-review-11Amia tb-review-11
Amia tb-review-11Russ Altman
 
Parallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemMLParallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemMLJanani C
 
南投縣發祥國小辦理教育優先區計畫實施情形考核表
南投縣發祥國小辦理教育優先區計畫實施情形考核表南投縣發祥國小辦理教育優先區計畫實施情形考核表
南投縣發祥國小辦理教育優先區計畫實施情形考核表Shi Guo Xian
 
Inside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick ReissInside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick ReissSpark Summit
 
Spark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko KorndorfSpark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko KorndorfSpark Summit
 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLJen Aman
 
The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark
The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and SparkThe Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark
The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and SparkAkshay Rai
 
Strata Beijing - Deep Learning in Production on Spark
Strata Beijing - Deep Learning in Production on SparkStrata Beijing - Deep Learning in Production on Spark
Strata Beijing - Deep Learning in Production on SparkAdam Gibson
 
критерії оцінювання
критерії оцінюваннякритерії оцінювання
критерії оцінюванняartischenkonatalia
 

Viewers also liked (20)

Classification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj SenClassification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj Sen
 
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by  Alexandre V EvfimievskiClustering and Factorization using Apache SystemML by  Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
 
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
 
Clustering and Factorization using Apache SystemML by Prithviraj Sen
Clustering and Factorization using Apache SystemML by  Prithviraj SenClustering and Factorization using Apache SystemML by  Prithviraj Sen
Clustering and Factorization using Apache SystemML by Prithviraj Sen
 
Regression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V EvfimievskiRegression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V Evfimievski
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
 
Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...
 
Resume sachin kuckian
Resume sachin kuckianResume sachin kuckian
Resume sachin kuckian
 
S1 DML Syntax and Invocation
S1 DML Syntax and InvocationS1 DML Syntax and Invocation
S1 DML Syntax and Invocation
 
Amia tb-review-11
Amia tb-review-11Amia tb-review-11
Amia tb-review-11
 
Parallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemMLParallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemML
 
南投縣發祥國小辦理教育優先區計畫實施情形考核表
南投縣發祥國小辦理教育優先區計畫實施情形考核表南投縣發祥國小辦理教育優先區計畫實施情形考核表
南投縣發祥國小辦理教育優先區計畫實施情形考核表
 
Inside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick ReissInside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick Reiss
 
Spark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko KorndorfSpark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko Korndorf
 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemML
 
The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark
The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and SparkThe Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark
The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark
 
How to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOSHow to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOS
 
Inside Apache SystemML
Inside Apache SystemMLInside Apache SystemML
Inside Apache SystemML
 
Strata Beijing - Deep Learning in Production on Spark
Strata Beijing - Deep Learning in Production on SparkStrata Beijing - Deep Learning in Production on Spark
Strata Beijing - Deep Learning in Production on Spark
 
критерії оцінювання
критерії оцінюваннякритерії оцінювання
критерії оцінювання
 

Similar to Apache SystemML Architecture by Niketan Panesar

Challenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop EngineChallenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop EngineNicolas Morales
 
IBM Runtimes Performance Observations with Apache Spark
IBM Runtimes Performance Observations with Apache SparkIBM Runtimes Performance Observations with Apache Spark
IBM Runtimes Performance Observations with Apache SparkAdamRobertsIBM
 
Ml also helps generic compiler ?
Ml also helps generic compiler ?Ml also helps generic compiler ?
Ml also helps generic compiler ?Ryo Takahashi
 
new_informatica_1
new_informatica_1new_informatica_1
new_informatica_1rao dpr
 
Apache Spark Performance Observations
Apache Spark Performance ObservationsApache Spark Performance Observations
Apache Spark Performance ObservationsAdam Roberts
 
Embedded computing platform design
Embedded computing platform designEmbedded computing platform design
Embedded computing platform designRAMPRAKASHT1
 
Managing Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingManaging Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingDatabricks
 
SystemML - Datapalooza Denver - 05.17.16 MWD
SystemML - Datapalooza Denver - 05.17.16 MWDSystemML - Datapalooza Denver - 05.17.16 MWD
SystemML - Datapalooza Denver - 05.17.16 MWDMike Dusenberry
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopDataWorks Summit
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 
A slide share pig in CCS334 for big data analytics
A slide share pig in CCS334 for big data analyticsA slide share pig in CCS334 for big data analytics
A slide share pig in CCS334 for big data analyticsKrishnaVeni451953
 
Informatica Online Training
Informatica Online TrainingInformatica Online Training
Informatica Online TrainingRao Rao
 
Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Etu Solution
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectMao Geng
 
R4ML: An R Based Scalable Machine Learning Framework
R4ML: An R Based Scalable Machine Learning FrameworkR4ML: An R Based Scalable Machine Learning Framework
R4ML: An R Based Scalable Machine Learning FrameworkAlok Singh
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkDataWorks Summit/Hadoop Summit
 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkCombining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkDatabricks
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introductionDong Ngoc
 

Similar to Apache SystemML Architecture by Niketan Panesar (20)

Hadoop and Spark
Hadoop and SparkHadoop and Spark
Hadoop and Spark
 
Challenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop EngineChallenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop Engine
 
OpenPOWER Webinar
OpenPOWER Webinar OpenPOWER Webinar
OpenPOWER Webinar
 
IBM Runtimes Performance Observations with Apache Spark
IBM Runtimes Performance Observations with Apache SparkIBM Runtimes Performance Observations with Apache Spark
IBM Runtimes Performance Observations with Apache Spark
 
Ml also helps generic compiler ?
Ml also helps generic compiler ?Ml also helps generic compiler ?
Ml also helps generic compiler ?
 
new_informatica_1
new_informatica_1new_informatica_1
new_informatica_1
 
Apache Spark Performance Observations
Apache Spark Performance ObservationsApache Spark Performance Observations
Apache Spark Performance Observations
 
Embedded computing platform design
Embedded computing platform designEmbedded computing platform design
Embedded computing platform design
 
Managing Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingManaging Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic Optimizing
 
SystemML - Datapalooza Denver - 05.17.16 MWD
SystemML - Datapalooza Denver - 05.17.16 MWDSystemML - Datapalooza Denver - 05.17.16 MWD
SystemML - Datapalooza Denver - 05.17.16 MWD
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on Hadoop
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
A slide share pig in CCS334 for big data analytics
A slide share pig in CCS334 for big data analyticsA slide share pig in CCS334 for big data analytics
A slide share pig in CCS334 for big data analytics
 
Informatica Online Training
Informatica Online TrainingInformatica Online Training
Informatica Online Training
 
Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project
 
R4ML: An R Based Scalable Machine Learning Framework
R4ML: An R Based Scalable Machine Learning FrameworkR4ML: An R Based Scalable Machine Learning Framework
R4ML: An R Based Scalable Machine Learning Framework
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkCombining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache Spark
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 

More from Arvind Surve

Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmApache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmArvind Surve
 
Apache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan PanesarApache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan PanesarArvind Surve
 
Clustering and Factorization using Apache SystemML by Prithviraj Sen
Clustering and Factorization using Apache SystemML by  Prithviraj SenClustering and Factorization using Apache SystemML by  Prithviraj Sen
Clustering and Factorization using Apache SystemML by Prithviraj SenArvind Surve
 
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by  Alexandre V EvfimievskiClustering and Factorization using Apache SystemML by  Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by Alexandre V EvfimievskiArvind Surve
 
Classification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj SenClassification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj SenArvind Surve
 
Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...Arvind Surve
 
DML Syntax and Invocation process
DML Syntax and Invocation processDML Syntax and Invocation process
DML Syntax and Invocation processArvind Surve
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalArvind Surve
 
Apache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold ReinwaldApache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold ReinwaldArvind Surve
 
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...Arvind Surve
 
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmApache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmArvind Surve
 
Regression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V EvfimievskiRegression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V EvfimievskiArvind Surve
 
Apache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold ReinwaldApache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold ReinwaldArvind Surve
 

More from Arvind Surve (13)

Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmApache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
 
Apache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan PanesarApache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan Panesar
 
Clustering and Factorization using Apache SystemML by Prithviraj Sen
Clustering and Factorization using Apache SystemML by  Prithviraj SenClustering and Factorization using Apache SystemML by  Prithviraj Sen
Clustering and Factorization using Apache SystemML by Prithviraj Sen
 
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by  Alexandre V EvfimievskiClustering and Factorization using Apache SystemML by  Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
 
Classification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj SenClassification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj Sen
 
Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...
 
DML Syntax and Invocation process
DML Syntax and Invocation processDML Syntax and Invocation process
DML Syntax and Invocation process
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
 
Apache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold ReinwaldApache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold Reinwald
 
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
 
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmApache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
 
Regression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V EvfimievskiRegression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V Evfimievski
 
Apache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold ReinwaldApache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold Reinwald
 

Recently uploaded

Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6Vanessa Camilleri
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17Celine George
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptxAneriPatwari
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 

Recently uploaded (20)

Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 

Apache SystemML Architecture by Niketan Panesar

  • 2. Agenda • High-level Design & APIs • Architecture Overview • Tooling • Important links 2 From http://systemml.apache.org/
  • 3. Agenda • High-level Design & APIs • Architecture Overview • Language • Compiler • Runtime • Two examples: • Simple DML expression with an example dataset • Linear Regression with varying datasizes • Tooling • Important links 3
  • 4. Agenda • High-level Design & APIs • Architecture Overview • Language • Compiler • Runtime • Two examples: • Simple DML expression with an example dataset • Linear Regression with varying datasizes • Tooling • Important links 4
  • 5. SystemML Design 5 DML (Declarative Machine Learning Language) Hadoop or Spark Cluster (scale-out) since 2010 In-Memory Single Node (scale-up) since 2012 since 2015 DML Scripts Data CP + b sb _mVar1 SPARK mapmm X _mvar1 _mVar2 RIGHT false NONE CP * y _mVar2 _mVar3 Hybrid execution plans* SystemML3. double [] [] 1. On disk/HDFS 2. RDD/DataFrame
  • 6. SystemML Design 6 Hadoop or Spark Cluster (scale-out) since 2010 In-Memory Single Node (scale-up) since 2012 DML Scripts Data SystemML 1. On disk/HDFS 2. RDD/DataFrame 3. double [] [] Command line API* (also MLContext*) -exec hadoop
  • 7. SystemML Design 7 Hadoop or Spark Cluster (scale-out) In-Memory Single Node (scale-up) since 2012 DML Scripts Data SystemML 1. On disk/HDFS 2. RDD/DataFrame 3. double [] [] Two options: 1. –exec singlenode 2. Use standalone jar (preserves rewrites, but may spawn Local MR jobs) Command line API* (also MLContext*)
  • 8. SystemML Design 8 Spark Cluster (scale-out) In-Memory Single Node (scale-up) since 2012 since 2015 DML Scripts Data SystemML 1. On disk/HDFS 2. RDD/DataFrame 3. double [] [] Command line API* (also MLContext*)
  • 9. SystemML Design 9 Spark Cluster (scale-out) In-Memory Single Node (scale-up) since 2012 since 2015 DML Scripts Data SystemML 1. On disk/HDFS 2. RDD/DataFrame 3. double [] [] MLContext API - Java/Python/Scala https://apache.github.io/incubator-systemml/spark-mlcontext-programming-guide.html
  • 10. SystemML Design 10 In-Memory Single Node (scale-up) since 2012 DML Scripts Data SystemML 1. On disk/HDFS 2. RDD/DataFrame 3. double [] [] JMLC API https://apache.github.io/incubator-systemml/jmlc.html
  • 11. Agenda • High-level Design & APIs • Architecture Overview • Language • Compiler • Runtime • Two examples: • Simple DML expression with an example dataset • Linear Regression with varying datasizes • Tooling • Important links 11
  • 12. From DML to Execution Plan 12 Hadoop or Spark Cluster (scale-out) In-Memory Single Node (scale-up) DML Scripts DML (Declarative Machine Learning Language) since 2010since 2012 since 2015 Data CP + b sb _mVar1 SPARK mapmm X _mvar1 _mVar2 RIGHT false NONE CP * y _mVar2 _mVar3 Hybrid execution plans* SystemML
  • 13. From DML to Execution Plan 13 Hadoop or Spark Cluster (scale-out) In-Memory Single Node (scale-up) Runtime Compiler Language DML Scripts DML (Declarative Machine Learning Language) since 2010since 2012 since 2015 Data CP + b sb _mVar1 SPARK mapmm X _mvar1 _mVar2 RIGHT false NONE CP * y _mVar2 _mVar3 Hybrid execution plans* Assuming an example dataset X: 100M X 500, y: 100M X 1, b/sb: 500 X 1
  • 15. SystemML Compilation Chain 15 • Parsing • Parse input DML/PyDML using Antlr v4 (see Dml.g4 and Pydml.g4) • Perform syntactic validation • Construct DMLProgram (=> list of Statement and function blocks) • Live Variable Analysis • Classic dataflow analysis • A variable is “live” if it holds value that may be needed in future • Dead code elimination • Semantic Validation
  • 16. SystemML Compilation Chain 16 • Dataflow in DAGs of operations on matrices, frames, and scalars • Choosing from alternative execution plans based on memory and cost estimates • Operator ordering & selection; hybrid plans
  • 17. SystemML Compilation Chain 17 * Discussed later in Tooling spark-submit --master yarn-client --driver-memory 20G --num-executors 4 --executor-memory 40G --executor-cores 24 SystemML.jar -f test.dml -explain hops
  • 18. SystemML Compilation Chain 18 • Low-level physical execution plan (LOPDags) • Over key-value pairs for MR • Over RDDs for Spark • “Piggybacking” operations into minimal number Map-Reduce jobs
  • 19. SystemML Compilation Chain 19 Spark CP + b sb _mVar1 SPARK mapmm X.MATRIX.DOUBLE _mvar1.MATRIX.DOUBLE _mVar2.MATRIX.DOUBLE RIGHT false NONE CP * y _mVar2 _mVar3
  • 20. SystemML Runtime • Hybrid Runtime • CP: single machine operations & orchestrate jobs • MR: generic Map-Reduce jobs & operations • SP: Spark Jobs • Numerically stable operators • Dense / sparse matrix representation • Multi-Level buffer pool (caching) to evict in-memory objects • Dynamic Recompilation for initial unknowns Control Program Runtime Program Buffer Pool ParFor Optimizer/ Runtime MR InstSpark Inst CP Inst Recompiler DFS IOMem/FS IO Generic MR Jobs MatrixBlock Library (single/multi-threaded)
  • 21. From DML to Execution Plan 21 Hadoop or Spark Cluster (scale-out) In-Memory Single Node (scale-up) Runtime Compiler Language DML Scripts DML (Declarative Machine Learning Language) since 2010since 2012 since 2015 Data CP + b sb _mVar1 SPARK mapmm X_mvar1 _mVar2 RIGHT false NONE CP * y _mVar2 _mVar3 Hybrid execution plans* Varying data sizes LinearRegression.dml
  • 22. A Data Scientist – Linear Regression 22 X ≈ Explanatory/ Independent Variables Predicted/ Dependant VariableModel w w = argminw ||Xw-y||2 +λ||w||2 Optimization Problem: next direction Iterate until convergence initialize step size update w initial direction accuracy measures Conjugate GradientMethod: • Start off with the (negative) gradient • For each step 1. Move to the optimal point along the chosen direction; 2. Recompute the gradient; 3. Project it onto the subspace conjugate* to allprior directions; 4. Use this as the next direction (* conjugate =orthogonalgiven A as the metric) A = XT X + λ y
  • 23. SystemML – Run LinReg CG on Spark 23 100M 10,000 100M 1 yX 100M 1,000 X 100M 100 X 100M 10 X 100M 1 y 100M 1 y 100M 1 y 8 TB 800 GB 80 GB 8 GB … tMMp … Multithreaded Single Node 20 GB Driver on 16c 6 x 55 GB Executors Hybrid Plan with RDD caching and fused operator Hybrid Plan with RDD out-of- core and fused operator Hybrid Plan with RDD out-of- core and different operators … x.persist(); ... X.mapValues(tMMv ) .reduce () … Driver Fused Executors … RDD cache: X tMMv tMMv … x.persist(); ... X.mapValues(tMMv) .reduce() ... Executors … RDD cache: X tMMv tMMv Driver Spilling … x.persist(); ... // 2 MxV mult // with broadcast, // mapToPair, and // reduceByKey ... Executors … RDD cache: X Mv tvM Mv tvM Driver Driver Cache
  • 24. Agenda • Architecture Overview • Language & APIs • Compiler • Runtime • Two examples: • Simple DML expression with an example dataset • Linear Regression with varying datasizes • Tooling • Important links 24
  • 26. Explain (Understanding Execution Plans) • Overview • Shows generated execution plan (at different compilation steps) • Introduced 05/2014 for internal usage • Important tool for understanding/debugging optimizer choices! • Usage • hadoop jar SystemML.jar -f test.dml –explain [hops | runtime | hops_recompile | runtime_recompile] • Hops • Program w/ hop dags after optimization • Runtime (default) • Program w/ generated runtime instructions • Hops_recompile: • See hops + hop dag after every recompile • Runtime_recompile: • See runtime + generated runtime instructions after every recompile 26
  • 27. Explain: Understanding HOP DAGs (simple DML) 27 Spark • HOP ID • HOP opcode • HOP input data dependencies (via HOP IDs) • HOP output matrix characteristics (rlen, clen, brlen, bclen, nnz) • Hop memory estimates (all inputs, intermediates, output à operation mem) • Hop execution type (CP/SP/MR) • Optional: indicators of reblock/checkpointing (caching) of hop outputs -explain hops -explain recompile_hops spark-submit --master yarn-client --driver-memory 20G --num-executors 4 --executor-memory 40G --executor-cores 24 SystemML.jar -f test.dml -explain hops Broadcast mem budget
  • 28. Explain: Understanding HOP DAGs (entire script) • Example DML Script (Simplified LinregDS) 28 X = read($1); y = read($2); intercept = $3; lambda = $4; if( intercept == 1 ) { ones = matrix(1, nrow(X), 1); X = append(X, ones); } I = matrix(1, ncol(X), 1); A = t(X) %*% X + diag(I*lambda); b = t(X) %*% y; beta = solve(A, b); write(beta, $5); Invocation: hadoop jar SystemML.jar -f linregds.dml -args X y 0 0 beta Scenario: X: 100,000 x 1,000, 1.0 y: 100,000 x 1, 1.0 (800MB, 200+GFlop)
  • 29. Explain: Understanding HOP DAGs (2) • Explain Hops 29 15/07/05 17:18:06 INFO api.DMLScript: EXPLAIN (HOPS): # Memory Budget local/remote = 57344MB/1434MB/1434MB # Degree of Parallelism (vcores) local/remote = 24/144/72 PROGRAM --MAIN PROGRAM ----GENERIC (lines 1-4) [recompile=false] ------(10) PRead X [100000,1000,1000,1000,100000000] [0,0,763 -> 763MB], CP ------(11) TWrite X (10) [100000,1000,1000,1000,100000000] [763,0,0 -> 763MB], CP ------(21) PRead y [100000,1,1000,1000,100000] [0,0,1 -> 1MB], CP ------(22) TWrite y (21) [100000,1,1000,1000,100000] [1,0,0 -> 1MB], CP ------(24) TWrite intercept [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP ------(26) TWrite lambda [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP ----GENERIC (lines 11-16) [recompile=false] ------(42) TRead X [100000,1000,1000,1000,100000000] [0,0,763 -> 763MB], CP ------(52) r(t) (42) [1000,100000,1000,1000,100000000] [763,0,763 -> 1526MB] ------(53) ba(+*) (52,42) [1000,1000,1000,1000,-1] [1526,8,8 -> 1541MB], CP ------(43) TRead y [100000,1,1000,1000,100000] [0,0,1 -> 1MB], CP ------(59) ba(+*) (52,43) [1000,1,1000,1000,-1] [764,0,0 -> 764MB], CP ------(60) b(solve) (53,59) [1000,1,1000,1000,-1] [8,8,0 -> 15MB], CP ------(66) PWrite beta (60) [1000,1,-1,-1,-1] [0,0,0 -> 0MB], CP Cluster Characteristics Program Structure (incl recompile) Unrolled HOP DAG Notes: if branch (6-9) and regularization removed by rewrites
  • 30. Explain: Understanding Runtime Plans (1) • Explain Runtime (simplified filenames, removed rmvar) 30 IBM Research 15/07/05 17:18:53 INFO api.DMLScript: EXPLAIN (RUNTIME): # Memory Budget local/remote = 57344MB/1434MB/1434MB # Degree of Parallelism (vcores) local/remote = 24/144/72 PROGRAM ( size CP/MR = 25/0 ) --MAIN PROGRAM ----GENERIC (lines 1-4) [recompile=false] ------CP createvar pREADX X false binaryblock 100000 1000 1000 1000 100000000 ------CP createvar pREADy y false binaryblock 100000 1 1000 1000 100000 ------CP assignvar 0.SCALAR.INT.true intercept.SCALAR.INT ------CP assignvar 0.0.SCALAR.DOUBLE.true lambda.SCALAR.DOUBLE ------CP cpvar pREADX X ------CP cpvar pREADy y ----GENERIC (lines 11-16) [recompile=false] ------CP createvar _mVar2 .../_t0/temp1 true binaryblock 1000 1000 1000 1000 -1 ------CP tsmm X.MATRIX.DOUBLE _mVar2.MATRIX.DOUBLE LEFT 24 ------CP createvar _mVar3 .../_t0/temp2 true binaryblock 1 100000 1000 1000 100000 ------CP r' y.MATRIX.DOUBLE _mVar3.MATRIX.DOUBLE ------CP createvar _mVar4 .../_t0/temp3 true binaryblock 1 1000 1000 1000 -1 ------CP ba+* _mVar3.MATRIX.DOUBLE X.MATRIX.DOUBLE _mVar4.MATRIX.DOUBLE 24 ------CP createvar _mVar5 .../_t0/temp4 true binaryblock 1000 1 1000 1000 -1 ------CP r' _mVar4.MATRIX.DOUBLE _mVar5.MATRIX.DOUBLE ------CP createvar _mVar6 .../_t0/temp5 true binaryblock 1000 1 1000 1000 -1 ------CP solve _mVar2.MATRIX.DOUBLE _mVar5.MATRIX.DOUBLE _mVar6.MATRIX.DOUBLE ------CP write _mVar6.MATRIX.DOUBLE .../beta.SCALAR.STRING.true textcell.SCALAR.STRING.true Literally a string representation of runtime instructions
  • 31. Stats (Profiling Runtime Statistics) • Overview • Profiles and shows aggregated runtime statistics of potential bottlenecks • Introduced 01/2014 for internal usage, extension of buffer pool stats 01/2013 • Important tool for understanding runtime characteristics and profiling/tuning system internals by developers • Usage • hadoop jar SystemML.jar -f test.dml -stats 31 IBM Research
  • 33. Debug (Script Debugging) • Overview • Script-level debugging by end-users (and developers) • Introduced 09/2014 as result of intern project • gdb-inspired command-line debugger interface • Usage • hadoop jar SystemML.jar -f test.dml -debug 33
  • 34. Agenda • Architecture Overview • Language & APIs • Compiler • Runtime • Two examples: • Simple DML expression with an example dataset • Linear Regression with varying datasizes • Tooling • Important links 34
  • 36. Important Links • Website: http://systemml.apache.org/ • Interested in SystemML ? • Go to https://github.com/apache/incubator-systemml and “Star it” 36
  • 37. Important Links • Website: http://systemml.apache.org/ • Interested in SystemML ? • Go to https://github.com/apache/incubator-systemml and “Star it” • Want to contribute to SystemML ? • See http://apache.github.io/incubator-systemml/contributing-to- systemml.html • List of issues: https://issues.apache.org/jira/browse/SYSTEMML/ • Ask any of our PMC members for suggestions • Want to try out SystemML ? • Laptop: http://apache.github.io/incubator-systemml/quick-start-guide.html (Does not require Hadoop/Spark installation) • Spark Cluster: http://apache.github.io/incubator-systemml/spark- mlcontext-programming-guide.html (Includes Jupyter/Zeppelin demo) 37