SlideShare a Scribd company logo
1 of 70
Download to read offline
1CAEPIA 2015 Albacete, November 9, 2015
Analysis of Massive
Data Streams Using
R
Antonio	
  Salmerón1,	
  Helge	
  Langseth2	
  
	
  Anders	
  L.	
  Madsen3,4,	
  Thomas	
  D.	
  Nielsen4	
  
	
  
1Dept.	
  Mathema?cs,	
  University	
  of	
  Almería,	
  Spain	
  
2Dept.	
  Computer	
  and	
  Informa?on	
  Science.	
  Norwegian	
  University	
  of	
  Science	
  and	
  
Technology,	
  Trondheim,	
  Norway	
  
3Hugin	
  Expert	
  A/S,	
  Aalborg,	
  Denmark	
  
4Dept.	
  Computer	
  Science,	
  Aalborg	
  University,	
  Denmark	
  
	
  
Outline
1.  Introduc+on	
  
o  Data	
  streams	
  
o  Challenges	
  when	
  processing	
  data	
  streams	
  
o  Why	
  Bayesian	
  networks?	
  
o  The	
  AMIDST	
  project	
  
2.  Bayesian	
  networks	
  
o  Sta?c	
  and	
  dynamic	
  models	
  
o  Inference	
  and	
  learning	
  
3.  Exploratory	
  analysis	
  
o  Exploratory	
  ?me	
  series	
  analysis	
  in	
  R	
  
o  Report	
  genera?on:	
  LaTeX	
  +	
  R	
  
4.  The	
  Ramidst	
  package	
  
o  The	
  AMIDST	
  toolbox	
  
o  Using	
  the	
  AMIDST	
  toolbox	
  from	
  R	
  
CAEPIA 2015 Albacete, November 9, 2015 2
Introduc?on	
  
Part	
  I	
  
	
  
CAEPIA 2015 Albacete, November 9, 2015 3
Data Streams everywhere
•  Unbounded	
  flows	
  of	
  data	
  are	
  generated	
  daily:	
  	
  
•  Social	
  Networks	
  
•  Network	
  Monitoring	
  
•  Financial/Banking	
  industry	
  
•  ….	
  
	
   CAEPIA 2015 Albacete, November 9, 2015 4
Data Stream Processing
•  Processing	
  data	
  streams	
  is	
  challenging:	
  
–  They	
  do	
  not	
  fit	
  in	
  main	
  memory	
  
–  Con?nuous	
  model	
  upda?ng	
  	
  
–  Con?nuous	
  inference	
  /	
  predic?on	
  
–  Concept	
  dri[	
  
	
   CAEPIA 2015 Albacete, November 9, 2015 5
Processing Massive Data Streams
• Scalability	
  is	
  a	
  main	
  issue:	
  
•  Scalable	
  compu?ng	
  infrastructure	
  
•  Scalable	
  models	
  and	
  algorithms	
  
	
  
CAEPIA 2015 Albacete, November 9, 2015 6
Why Bayesian networks?
§  Example:	
  	
  
§  Stream	
  of	
  sensor	
  measurements	
  about	
  temperature	
  and	
  
smoke	
  presence	
  in	
  a	
  given	
  geographical	
  area.	
  
§  The	
  stream	
  is	
  analysed	
  to	
  detect	
  the	
  presence	
  of	
  fire	
  (event	
  
detec?on	
  problem)	
  
	
  
?	
  
CAEPIA 2015 Albacete, November 9, 2015 7
§  The	
  problem	
  can	
  be	
  approached	
  as	
  an	
  anomaly	
  
detec+on	
  task	
  (outliers)	
  
§  A	
  commonly	
  used	
  method	
  is	
  Streaming	
  K-­‐Means	
  
Why Bayesian networks?
Anomaly	
  
CAEPIA 2015 Albacete, November 9, 2015 8
Why Bayesian networks?
§  OJen,	
  data	
  streams	
  are	
  handled	
  using	
  black-­‐box	
  models:	
  
§  Pros:	
  
§  No	
  need	
  to	
  understand	
  the	
  problem	
  
	
  
§  Cons:	
  
§  Hyper-­‐parameters	
  to	
  be	
  tuned	
  
§  Black-­‐box	
  models	
  can	
  seldom	
  explain	
  away	
  
Stream	
  
Black-­‐box	
  Model	
  
Predic?ons	
  
CAEPIA 2015 Albacete, November 9, 2015 9
§  Bayesian	
  Networks:	
  
§  Open-­‐box	
  models	
  
§  Encode	
  prior	
  knowledge.	
  
§  Con?nuous	
  and	
  discrete	
  variables	
  (CLG	
  networks).	
  	
  
§  Example:	
  	
  
	
  
Why Bayesian networks?
Fire	
  
Temp	
   Smoke	
  
T1	
   T2	
   T3	
   S1	
  
	
  p(Fire=true|t1,t2,t3,s1)	
  
CAEPIA 2015 Albacete, November 9, 2015 10
Why Bayesian networks?
Stream	
   Predic+ons	
  
Open-­‐box	
  Models	
  
CAEPIA 2015 Albacete, November 9, 2015 11
Why Bayesian networks?
Stream	
   Predic+ons	
  
Open-­‐box	
  Models	
  
Black-­‐box	
  Inference	
  Engine	
  
	
  	
  	
  	
  	
  	
  	
  (mul+-­‐core	
  paralleliza+on)	
  
	
  
CAEPIA 2015 Albacete, November 9, 2015 12
The AMIDST project
§  FP7-­‐funded	
  EU	
  project	
  
§  Large	
  number	
  of	
  variables	
  
§  Data	
  arriving	
  in	
  streams	
  
§  Based	
  on	
  hybrid	
  Bayesian	
  networks	
  
§  Open	
  source	
  toolbox	
  with	
  learning	
  and	
  inference	
  capabili?es	
  
§  Two	
  use	
  cases	
  provided	
  by	
  industrial	
  partners	
  
§  Predic+on	
  of	
  maneuvers	
  in	
  highway	
  traffic	
  (Daimler)	
  
§  Risk	
  predic+on	
  in	
  credit	
  opera+ons	
  and	
  customer	
  profiling	
  (BCC)	
  
§  hZp://www.amidst.eu	
  
	
  
	
  
	
  
	
  
CAEPIA 2015 Albacete, November 9, 2015 13
ODELADO CON REDES BAYESIANAS
DINÁMICAS HÍBRIDAS
RESULTADOS OBTENIDOS EN LA
PREDICCIÓN DE MANIOBRAS DE TRÁF
REDES BAYESIANAS DINÁMICAS
DE 2 ETAPAS TEMPORALES
S DE MARKOV
AYESIANAS
s entre variables vienen dadas
dirigido. Se conocen las
s condicionales de probabilidad,
ores de los padres.
SIANA DINÁMICA PARA LA EVIDENCIA LATERAL EN UN VEHÍCULO
• Es preferible analizar l
tendencia a fijar un valo
en la probabilidad.
• Con ello, es posible prede
maniobras con mayor ante
usando otros métodos.
• Las redes Bayesianas diná
mediante el uso de algorit
inferencia aproximados, so
herramienta adecuada par
dificultades de este proble
• El paquete AMIDST perm
análisis de datos en tiempo
mediante el uso de redes B
dinámicas, proporcionando
adecuado para intentar res
problema.
• Se espera que estas y otr
contribuciones reduzcan e
víctimas de accidentes de
buscando el objetivo de co
vehículo totalmente segur
Bayesian	
  networks	
  
Part	
  II	
  
	
  
CAEPIA 2015 Albacete, November 9, 2015 14
Definition
§  Formally,	
  a	
  Bayesian	
  network	
  consists	
  of	
  
§  A	
  directed	
  acyclic	
  graph	
  (DAG)	
  where	
  each	
  node	
  is	
  a	
  random	
  
variable	
  
§  A	
  set	
  of	
  condi?onal	
  probability	
  distribu?ons,	
  one	
  for	
  each	
  
variable	
  condi?onal	
  on	
  its	
  parents	
  in	
  the	
  DAG	
  
	
  
§  For	
  a	
  	
  set	
  of	
  variables	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ,	
  the	
  joint	
  
distribu+on	
  factorizes	
  as	
  
	
  
	
  
§  The	
  factoriza?on	
  allows	
  local	
  computa?ons	
  
	
  
CAEPIA 2015 Albacete, November 9, 2015 15
CT 619209 / AMIDST
Page 8 of 63
mally, let X = {X1, . . . , XN } denote the set of stochastic random variables d
domain problem. A BN defines a joint distribution P(X) in the following for
p(X) =
NY
i=1
p(Xi|Pa(Xi))
e Pa(Xi) ⇢ XXi represents the so-called parent variables of Xi. Bayesian ne
be graphically represented by a directed acyclic graph (DAG). Each node, la
n the graph, is associated with a factor or conditional probability p(Xi|Pa
FP7-ICT 619209 / AMIDST
Page 8 of 63
Formally, let X = {X1, . . . , XN } denote the set o
our domain problem. A BN defines a joint distrib
p(X) =
NY
i=1
p(Xi|P
where Pa(Xi) ⇢ XXi represents the so-called pa
Reading independencies
Independence	
  rela+ons	
  can	
  be	
  read	
  off	
  from	
  the	
  
structure	
  
	
  
There	
  are	
  three	
  types	
  of	
  connec?ons:	
  
	
  
§  Serial	
  
	
  
§  Diverging	
  
	
  
§  Converging	
  
	
   CAEPIA 2015 Albacete, November 9, 2015 16
Tipos de conexiones
Conexi´on en serie:
A B C
Conexi´on divergente:
A B C
Conexi´on convergente:
A B C
Tipos de conexiones
Conexi´on en serie:
A B C
Conexi´on divergente:
A B C
Conexi´on convergente:
A B C
Tipos de conexiones
Conexi´on en serie:
A B C
Conexi´on divergente:
A B C
Conexi´on convergente:
A B C
 
Reading independencies.
Example
Fire	
  
Temp	
   Smoke	
  
T1	
   T2	
   T3	
   S1	
  
CAEPIA 2015 Albacete, November 9, 2015 17
•  Knowing	
  the	
  temperature	
  with	
  certainty	
  makes	
  the	
  temperature	
  sensor	
  readings	
  
and	
  the	
  event	
  of	
  fire	
  independent	
  
•  The	
  smoke	
  sensor	
  reading	
  is	
  also	
  irrelevant	
  to	
  the	
  event	
  of	
  fire	
  if	
  Smoke	
  is	
  known	
  for	
  
sure	
  
 
Reading independencies.
Example
Fire	
  
Temp	
   Smoke	
  
T1	
   T2	
   T3	
   S1	
  
CAEPIA 2015 Albacete, November 9, 2015 18
•  Knowing	
  the	
  temperature	
  with	
  certainty	
  makes	
  the	
  temperature	
  sensor	
  readings	
  
and	
  the	
  event	
  of	
  fire	
  independent	
  
•  The	
  smoke	
  sensor	
  reading	
  is	
  also	
  irrelevant	
  to	
  the	
  event	
  of	
  fire	
  if	
  Smoke	
  is	
  known	
  for	
  
sure	
  
•  If	
  there	
  is	
  no	
  info	
  about	
  Temp	
  or	
  sensor	
  readings,	
  Sun	
  and	
  Fire	
  are	
  independent	
  
Sun	
  
Hybrid Bayesian networks
CAEPIA 2015 Albacete, November 9, 2015 19
•  In	
  a	
  hybrid	
  Bayesian	
  network,	
  discrete	
  and	
  con?nuous	
  variables	
  coexist	
  
•  Mixtures	
  of	
  truncated	
  basis	
  func?ons	
  (MoTBFs)	
  have	
  been	
  successfully	
  used	
  
in	
  this	
  context	
  (Langseth	
  et	
  al.	
  2012)	
  
•  Mixtures	
  of	
  truncated	
  exponen?als	
  (MTEs)	
  
•  Mixtures	
  of	
  polynomials	
  (MoPs)	
  
•  MoTBFs	
  support	
  efficient	
  inference	
  and	
  learning	
  in	
  a	
  sta?c	
  seeng	
  
•  Learning	
  from	
  streams	
  is	
  more	
  problema?c	
  
•  The	
  reason	
  is	
  that	
  they	
  do	
  not	
  belong	
  to	
  the	
  exponen?al	
  family	
  
The exponential family
CAEPIA 2015 Albacete, November 9, 2015 20
•  A	
  family	
  of	
  probability	
  func?ons	
  belongs	
  to	
  the	
  k	
  parametric	
  
exponen?al	
  family	
  if	
  it	
  can	
  be	
  expressed	
  as	
  
uación 2.14 puede expresarse, de forma equivalente, como
f(x; θ) = H(x)C(θ) exp{Q(θ)T(x)}
ón 2.11 La familia de funciones de densidad o de masa de probabilid
θ ∈ Θ ⊆ Rk} pertenece a la familia exponencial k-paramétrica si
f(x; θ) = exp
k
i=1
Qi(θ)Ti(x) + D(θ) + S(x)
e se considera como el soporte de una distribución el conjunto {x ∈ X | f(x; θ) > 0} au
tribuciones de tipo continuo tal definición podría no ser adecuada ya que podríamos re
densidad en una cantidad numerable de puntos sin cambiar la distribución por lo que e
o estaría definido de forma única. Una definición más precisa es considerar que x ∈ X p
P{x − h < X < x + h} > 0 para cualquier h > 0.
melo Rodríguez Torreblanca
adística y Mat. Aplicada. UAL
•  The	
  Ti	
  func?ons	
  are	
  the	
  sufficient	
  sta?s?cs	
  for	
  the	
  unknown	
  
parameters,	
  i.e.,	
  they	
  contain	
  all	
  the	
  informa?on	
  in	
  the	
  sample	
  
that	
  is	
  relevant	
  for	
  es?ma?ng	
  the	
  parameters	
  
•  They	
  have	
  dimension	
  1	
  
•  We	
  can	
  “compress”	
  all	
  the	
  informa?on	
  in	
  the	
  stream	
  so	
  far	
  as	
  a	
  
single	
  number	
  for	
  each	
  parameter	
  
Hybrid Bayesian networks. CLGs
CAEPIA 2015 Albacete, November 9, 2015 21
Conditional Linear Gaussian networks
A Conditional Linear Gaussian (CLG) network is a hybrid Bayesian
network where
I The conditional distribution of each discrete variable XD given
its parents is a multinomial
I The conditional distribution of each continuous variable Z
with discrete parents XD and continuous parents XC , is
p(z|XD = xD, XC = xC ) = N(z; ↵(xD) + (xD)T
xC , (xD))
for all xD and xC , where ↵ and are the coefficients of a
linear regression model of Z given XC , potentially different for
each configuration of XD.
ECSQARU 2015, Compiegne, July 17, 2015 4
CLGs	
  belong	
  to	
  the	
  exponen?al	
  family	
  
CLGs: Example
CAEPIA 2015 Albacete, November 9, 2015 22
Y
W
TU
S
P(Y ) = (0.5, 0.5)
P(S) = (0.1, 0.9)
f (w|Y = 0) = N(w; 1, 1)
f (w|Y = 1) = N(w; 2, 1)
f (t|w, S = 0) = N(t; w, 1)
f (t|w, S = 1) = N(t; w, 1)
f (u|w) = N(u; w, 1)
ECSQARU 2015, Compiegne, July 17, 2015 5
Conditional Linear Gaussian networks. Example
Y
W
TU
S
P(Y ) = (0.5, 0.5)
P(S) = (0.1, 0.9)
f (w|Y = 0) = N(w; 1, 1)
f (w|Y = 1) = N(w; 2, 1)
f (t|w, S = 0) = N(t; w, 1)
f (t|w, S = 1) = N(t; w, 1)
f (u|w) = N(u; w, 1)
ECSQARU 2015, Compiegne, July 17, 2015 5
CLGs: Example
CAEPIA 2015 Albacete, November 9, 2015 23
Y
W
TU
S
P(Y ) = (0.5, 0.5)
P(S) = (0.1, 0.9)
f (w|Y = 0) = N(w; 1, 1)
f (w|Y = 1) = N(w; 2, 1)
f (t|w, S = 0) = N(t; w, 1)
f (t|w, S = 1) = N(t; w, 1)
f (u|w) = N(u; w, 1)
ECSQARU 2015, Compiegne, July 17, 2015 5
Conditional Linear Gaussian networks. Example
Y
W
TU
S
P(Y ) = (0.5, 0.5)
P(S) = (0.1, 0.9)
f (w|Y = 0) = N(w; 1, 1)
f (w|Y = 1) = N(w; 2, 1)
f (t|w, S = 0) = N(t; w, 1)
f (t|w, S = 1) = N(t; w, 1)
f (u|w) = N(u; w, 1)
ECSQARU 2015, Compiegne, July 17, 2015 5
§  Limita+on:	
  discrete	
  	
  nodes	
  are	
  not	
  allowed	
  to	
  have	
  
con?nuous	
  parents	
  
§  This	
  is	
  not	
  a	
  big	
  problem	
  for	
  Bayesian	
  classifiers	
  
	
  
	
  
Bayesian network classifiers
§  The	
  structure	
  is	
  usually	
  restricted	
  
§  There	
  is	
  a	
  dis?nguished	
  (discrete)	
  variable	
  called	
  the	
  
class	
  while	
  the	
  rest	
  are	
  called	
  features	
  
§  Examples:	
  
	
  
	
  
	
  
CAEPIA 2015 Albacete, November 9, 2015 24
C
X2X1
... Xn
(a)
C
X2X1 X3 X4
(b)
Figure 1: Structure of naive Bayes (a) and TAN (b) classifiers.
In general, there are several possible TAN structures for a given set of138
variables. The way to choose among them is to construct a maximum weight139
spanning tree containing the features, where the weight of each edge is the140
Naive	
  Bayes	
   Tree-­‐augmented	
  network	
  (TAN)	
  
Bayesian network classifiers
§  The	
  class	
  value	
  is	
  determined	
  as	
  
	
  
	
  
	
  
§  In	
  the	
  case	
  of	
  Naïve	
  Bayes,	
  	
  
CAEPIA 2015 Albacete, November 9, 2015 25
ayesian network can be used as a classifier if it contains a cla
a set of continuous or discrete explanatory variables X1, . . . ,
ect with observed features x1, . . . , xn will be classified as be
2 ⌦C obtained as
c⇤
= arg max
c2⌦C
p(c|x1, . . . , xn),
⌦C denotes the set of all posible values of C.
nsidering that p(c|x1, . . . , xn) is proportional to p(c) ⇥ p(x1,
cification of an n dimensional distribution for X1, . . . , Xn
d in order to solve the classification problem, which implies a
mputational cost, as the number of parameters necessary to
stribution is exponential in the number of variables, in the w
er, this problem is simplified if we take advantage of the fac
d by the BN. Since building a network without any structur
not always feasible (they might be as complex as the above m
ded by the BN. Since building a network without any structur
is not always feasible (they might be as complex as the above m
distribution), networks with fixed or restricted and simple
utilized instead when facing classification tasks. The extreme c
e Bayes (NB) structure, where all the feature variables are c
pendent given C, as depicted in Fig. 1(a). The strong assu
pendence behind NB models is somehow compensated by the
e number of parameters to be estimated from data, since in th
s that
p(c|x1, . . . , xn) / p(c)
nY
i=1
p(xi|c) ,
h means that, instead of one n-dimensional conditional densi
nsional conditional densities must be estimated.
n TAN models, more dependencies are allowed, expanding the
Reasoning over time: Dynamic
Bayesian networks
§  Temporal	
  reasoning	
  can	
  be	
  accommodated	
  within	
  BNs	
  
§  Variables	
  are	
  indexed	
  over	
  ?me,	
  giving	
  rise	
  to	
  dynamic	
  
Bayesian	
  networks	
  
§  We	
  have	
  to	
  model	
  the	
  joint	
  distribu?on	
  over	
  ?me	
  
§  Dynamic	
  BNs	
  reduce	
  the	
  factoriza?on	
  complexity	
  by	
  
adop?ng	
  the	
  Markov	
  assump?on	
  
	
  
	
  
CAEPIA 2015 Albacete, November 9, 2015 26
Similarly to static BNs, we model our problem/system using a set of stochastic ran
variables, denoted Xt, with the main di↵erence that variables are indexed here
discrete time index t. In this way, we explicitly model the state of the system a
given time. Moreover, we always assume that the system is described at a fixed frequ
and use Xa:b ⌘ Xa, Xa+1, . . . , Xb to denote the set of variables between two time p
a and b.
For reasoning over time, we need to model the joint probability p(X1:T ) which ha
following natural cascade decomposition:
p(X1:T ) =
TY
t=1
p(Xt|X1:t 1),
where p(Xt|X1:t 1) is equal to p(X1) for t = 1. As t increases, the conditional
ability p(Xt|X1:t 1) becomes intractable. Similarly to static BNs, dynamic BNs
more compact factorization of the above joint probability. The first kind of condit
independence assumption encoded by DBNs to reduce the factorization complex
the well-known Markov assumption. Under this assumption, the current state is
pendent from the previous one given a finite number of previous steps and the resu
models are referred to as Markov chains. Basically, a Markov chain can be defin
either discrete or continuous variables X1:T . It exploits the following equality:
lowing natural cascade decomposition:
p(X1:T ) =
TY
t=1
p(Xt|X1:t 1),
here p(Xt|X1:t 1) is equal to p(X1) for t = 1. As t increases, the conditio
ility p(Xt|X1:t 1) becomes intractable. Similarly to static BNs, dynamic B
ore compact factorization of the above joint probability. The first kind of c
dependence assumption encoded by DBNs to reduce the factorization com
e well-known Markov assumption. Under this assumption, the current sta
ndent from the previous one given a finite number of previous steps and the
odels are referred to as Markov chains. Basically, a Markov chain can be d
her discrete or continuous variables X1:T . It exploits the following equality
p(Xt|X1:t 1) = p(Xt|Xt V :t 1)
here V 1 is the order of the Markov chain. Figure 3.3 shows two example
rresponding to first-order (i.e., V = 1) and third-order (i.e., V = 3) Markov
Reasoning over time: Dynamic
Bayesian networks
§  DBN	
  assuming	
  third	
  order	
  Markov	
  assump?on	
  
§  DBN	
  assuming	
  first	
  order	
  Markov	
  assump?on	
  
	
  
	
  
	
  
	
  
	
  
	
   CAEPIA 2015 Albacete, November 9, 2015 27
9209 / AMIDST
Page 11 of 63
Publi
.3: An example of DBNs assuming a third-order (above) and a first-orde
Markov property.
an unrealistic assumption in some problems leading to poor approximations o
distribution. One could increase the Markov order to improve the approxima
9209 / AMIDST
Page 11 of 63
Publi
.3: An example of DBNs assuming a third-order (above) and a first-order
Markov property.
Particular cases of Dynamic
Bayesian networks
§  Hidden	
  Markov	
  models	
  
§  The	
  joint	
  distribu?on	
  of	
  the	
  hidden	
  (X)	
  and	
  observed	
  (Y)	
  
variables	
  is	
  
	
  
	
  
	
  
	
  
	
  
CAEPIA 2015 Albacete, November 9, 2015 28
FP7-ICT 619209 / AMIDST
Page 12 of 63
Publi
Figure 3.4: An example of a BN structure corresponding to a HMM.
P(X1:T , Y1:T ) =
tY
t=1
P(Xt|Xt 1)P(Yt|Xt). (3.1)
Although most of our models will fit into this description of observed and hidden (state)
variables, there will be cases in which the transition model takes place in the observed
CT 619209 / AMIDST
Page 12 of 63
Figure 3.4: An example of a BN structure corresponding to a HMM.
P(X1:T , Y1:T ) =
tY
t=1
P(Xt|Xt 1)P(Yt|Xt).
hough most of our models will fit into this description of observed and hidden (s
ables, there will be cases in which the transition model takes place in the obs
Particular cases of Dynamic
Bayesian networks
§  Input-­‐output	
  Hidden	
  Markov	
  models	
  
	
  
§  Linear	
  dynamic	
  systems:	
  switching	
  Kalman	
  filter	
  
	
  
	
   CAEPIA 2015 Albacete, November 9, 2015 29
variables (see, e.g., the case of Cajamar), which in general simplifies the learning-
inference processes of the problem.
An extension of the HMM is the so-called input-output hidden Markov model (IOHMM)
shown in Figure 3.5. IOHMM incorporates an extra top layer of input variables Y0
1:T ,
which can be either continuous or discrete. The existing HMM layer of observed vari-
ables, Y1:T , is referred to as the output set of variables.
Figure 3.5: An example of a BN structure corresponding to an IO-HMM.
IOHMM is usually employed in supervised classification problems. In this case, both
input and output variables are known during training, but only the former is known
during testing. In fact, during testing, inference is performed to predict the output
variables at each time step. In AMIDST we use this model in a di↵erent way. In our
case, both set of input and output variables are always known, so that inference is only
performed to predict the latent variables. The input variables Y0
1:T are introduced as
a way to “relax” the stationary assumption, by explicitly introducing a dependency to
some observed information at each time slice, that is, the transition probability between
Similar to the extension of the static BN model to hybrid domains, DBNs have likewise
been extended to continuous and hybrid domains. In purely continuous domains, where
the continuous variables follow linear Gaussian distributions, the DBN corresponds to
(a factorized version of) a Kalman filter (KF). The structure of a KF is exactly the same
as the one displayed in Figure 3.4 for the HMM, however with the restriction that all
variables should be continuous. In this case, the state variables can be a combination of
continuous variables with di↵erent dependences, and where the dynamics of the process
are assumed to be linear.
When modelling non-linear domains, the dynamics and observational distributions are
often approximated through, e.g., the extended Kalman filter, which models the system
as locally linear in the mean of the current state distribution. Another type of model
ensuring non-linear predictions with a more expressive representation is the switching
Kalman filter (SKF). The type of SKF that we are going to consider here includes an
extra discrete state variable that is able to use a weighted combination of the linear
sub-models. That is, the discrete state variable assigns a probability to each linear term
in the mixture, hence, representing the belief state as a mixture of Gaussians. In this
way, it can deal, to some extent, with violations of both the assumption of linearity and
Gaussian noise. Figure 3.6 depicts the graphical structure of this dynamic model.
Figure 3.6: An example of a switching Kalman filter. Zt represents the discrete state
Two-time slice Dynamic
Bayesian networks (2T-DBN)
§  They	
  conform	
  the	
  main	
  dynamic	
  model	
  in	
  AMIDST	
  
	
  
§  The	
  transi+on	
  distribu+on	
  is	
  
CAEPIA 2015 Albacete, November 9, 2015 30
Figure 3.7: An example of a BN structure corresponding to a 2T-DBN.
T-DBN, the transition distribution is represented as follows:
p(Xt+1|Xt) =
Y
Xt+12Xt+1
p(Xt+1|Pa(Xt+1)),
Pa(Xt+1) refers to the set of parents of the variable Xt+1 in the transition m
In general, DBNs can model arbitrary distributions over time. However, in AMIDST,
we will especially focus on the so-called two-time slice DBNs (2T-DBNs). 2T-DBNs
are characterised by an initial model representing the initial joint distribution of the
process and a transition model representing a standard BN repeated over time. This
kind of DBN model satisfies both the first-order Markov assumption and the stationarity
assumption. Figure 3.7 shows an example of a graphical structure of a 2T-DBN model.
Figure 3.7: An example of a BN structure corresponding to a 2T-DBN.
In a 2T-DBN, the transition distribution is represented as follows:
p(Xt+1|Xt) =
Y
Xt+12Xt+1
p(Xt+1|Pa(Xt+1)),
where Pa(Xt+1) refers to the set of parents of the variable Xt+1 in the transition model,
Inference in CLG networks
§  There	
  are	
  three	
  ways	
  of	
  querying	
  a	
  BN	
  
§  Belief	
  upda?ng	
  (probability	
  propaga?on)	
  
§  Maximum	
  a	
  posteriori	
  (MAP)	
  
§  Most	
  probable	
  explana?on	
  (MPE)	
  
	
  
CAEPIA 2015 Albacete, November 9, 2015 31
Inference in CLG networks
CAEPIA 2015 Albacete, November 9, 2015 32
Querying a Bayesian network (I)
I Probabilistic inference: Computing the posterior distribution of
a target variable:
p(xi |xE ) =
X
xD
Z
xC
p(x, xE )dxC
X
xDi
Z
xCi
p(x, xE )dxCi
Inference in CLG networks
CAEPIA 2015 Albacete, November 9, 2015 33
Querying a Bayesian network (II)
I Maximum a posteriori (MAP): For a set of target variables XI ,
the goal is to compute
x⇤
I = arg max
xI
p(xI |XE = xE )
where p(xI |XE = xE ) is obtained by first marginalizing out
from p(x) the variables not in XI and not in XE
I Most probable explanation (MPE): A particular case of MAP
where XI includes all the unobserved variables
ECSQARU 2015, Compiegne, July 17, 2015 8
Probability propagation in CLG
networks: Importance sampling
CAEPIA 2015 Albacete, November 9, 2015 34
•  Let’s	
  denote	
  by	
  	
  	
  	
  	
  	
  	
  the	
  posterior	
  probability	
  for	
  the	
  target	
  
variable,	
  and	
  
	
  
	
  
•  Then,	
  
	
  
Therefore,	
  we	
  have	
  transformed	
  the	
  problem	
  of	
  probability	
  
propaga?on	
  into	
  es?ma?ng	
  the	
  expected	
  value	
  of	
  a	
  random	
  
variable	
  from	
  a	
  sample	
  drawn	
  from	
  a	
  distribu?on	
  of	
  our	
  own	
  
choice	
  
Scalable approximate inference in CLG networks 5
umerator of Eq. (2), i.e. ✓ =
R b
a
h(xi)dxi with
(xi) =
X
xD2⌦XD
Z
xC 2⌦XC
p(x; xE)dxC.
e ✓ as
xi)dxi =
Z b
a
h(xi)
p⇤(xi)
p⇤
(xi)dxi = Ep⇤

h(X⇤
i )
p⇤(X⇤
i )
, (6)
ity density function on (a, b) called the sampling distribu-
dom variable with density p⇤
. Let X⇤
i
(1)
, . . . , X⇤
i
(m)
be a
. Then it is easy to prove that
ˆ 1
mX h(X⇤
i
(j)
)
Scalable approximate inference in CLG networks 5
Let ✓ denote the numerator of Eq. (2), i.e. ✓ =
R b
a
h(xi)dxi with
h(xi) =
X
xD2⌦XD
Z
xC 2⌦XC
p(x; xE)dxC.
Then, we can write ✓ as
✓ =
Z b
a
h(xi)dxi =
Z b
a
h(xi)
p⇤(xi)
p⇤
(xi)dxi = Ep⇤

h(X⇤
i )
p⇤(X⇤
i )
, (6)
here p⇤
is a probability density function on (a, b) called the sampling distribu-
on, and X⇤
i is a random variable with density p⇤
. Let X⇤
i
(1)
, . . . , X⇤
i
(m)
be a
ample drawn from p⇤
. Then it is easy to prove that
ˆ✓1 =
1
m
mX
j=1
h(X⇤
i
(j)
)
p⇤(X⇤
i
(j)
)
(7)
Scalable approximate inference in CLG networks
Let ✓ denote the numerator of Eq. (2), i.e. ✓ =
R b
a
h(xi)dxi with
h(xi) =
X
xD2⌦XD
Z
xC 2⌦XC
p(x; xE)dxC.
Then, we can write ✓ as
✓ =
Z b
a
h(xi)dxi =
Z b
a
h(xi)
p⇤(xi)
p⇤
(xi)dxi = Ep⇤

h(X⇤
i )
p⇤(X⇤
i )
,
where p⇤
is a probability density function on (a, b) called the sampling distr
tion, and X⇤
i is a random variable with density p⇤
. Let X⇤
i
(1)
, . . . , X⇤
i
(m)
b
sample drawn from p⇤
. Then it is easy to prove that
ˆ✓1 =
1
m
mX
j=1
h(X⇤
i
(j)
)
p⇤(X⇤
i
(j)
)
Probability propagation in CLG
networks: Importance sampling
CAEPIA 2015 Albacete, November 9, 2015 35
•  The	
  expected	
  value	
  can	
  be	
  es?mated	
  using	
  a	
  sample	
  mean	
  
es?mator.	
  Let	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  be	
  a	
  sample	
  drawn	
  from	
  
p*.	
  Then	
  a	
  consistent	
  unbiased	
  es?mator	
  of	
  	
  	
  	
  	
  is	
  given	
  by	
  	
  
•  In	
  AMIDST,	
  the	
  sampling	
  distribu?on	
  is	
  formed	
  by	
  the	
  
condi?onal	
  distribu?ons	
  in	
  the	
  network	
  (Evidence	
  weigh?ng)	
  
(xi)
(xi)
p⇤
(xi)dxi = Ep⇤

h(X⇤
i )
p⇤(X⇤
i )
, (6)
ction on (a, b) called the sampling distribu-
with density p⇤
. Let X⇤
i
(1)
, . . . , X⇤
i
(m)
be a
sy to prove that
mX
j=1
h(X⇤
i
(j)
)
p⇤(X⇤
i
(j)
)
(7)
e estimation is determined by its variance,
⇤(j)
)
⇤
i
(j)
)
1
A =
1
m2
mX
j=1
Var
h(X⇤
i
(j)
)
p⇤(X⇤
i
(j)
)
!
Scalable approximate inference in CLG network
Let ✓ denote the numerator of Eq. (2), i.e. ✓ =
R b
a
h(xi)dxi with
h(xi) =
X
xD2⌦XD
Z
xC 2⌦XC
p(x; xE)dxC.
Then, we can write ✓ as
✓ =
Z b
a
h(xi)dxi =
Z b
a
h(xi)
p⇤(xi)
p⇤
(xi)dxi = Ep⇤

h(X⇤
i )
p⇤(X⇤
i )
,
where p⇤
is a probability density function on (a, b) called the sampling
tion, and X⇤
i is a random variable with density p⇤
. Let X⇤
i
(1)
, . . . , X⇤
i
sample drawn from p⇤
. Then it is easy to prove that
hen, we can write ✓ as
✓ =
Z b
a
h(xi)dxi =
Z b
a
h(xi)
p⇤(xi)
p⇤
(xi)dxi = Ep⇤

h(X⇤
i )
p⇤(X⇤
i )
,
p⇤
is a probability density function on (a, b) called the sampling dis
and X⇤
i is a random variable with density p⇤
. Let X⇤
i
(1)
, . . . , X⇤
i
(m)
e drawn from p⇤
. Then it is easy to prove that
ˆ✓1 =
1
m
mX
j=1
h(X⇤
i
(j)
)
p⇤(X⇤
i
(j)
)
unbiased estimator of ✓.
s ˆ✓1 is unbiased, the error of the estimation is determined by its var
is
Var(ˆ✓1) = Var
0
@ 1
m
mX h(X⇤
i
(j)
)
p⇤(X⇤(j)
)
1
A =
1
m2
mX
Var
h(X⇤
i
(j)
)
p⇤(X⇤(j)
)
!
=
a
h(xi)dxi =
a
h(xi)
p⇤(xi)
p⇤
(xi)dxi = Ep⇤
h
p
probability density function on (a, b) called th
⇤
is a random variable with density p⇤
. Let Xi
n from p⇤
. Then it is easy to prove that
ˆ✓1 =
1
m
mX
j=1
h(X⇤
i
(j)
)
p⇤(X⇤
i
(j)
)
d estimator of ✓.
unbiased, the error of the estimation is determi
Probability propagation in CLG
networks: Importance sampling
CAEPIA 2015 Albacete, November 9, 2015 36
Stream	
  
S.	
  Dist	
   Map	
   Reduce	
  
Stream	
  
C.U.	
  C.U.	
  
C.U.	
  C.U.	
  
Sample	
  
genera?on	
  
Sufficient	
  
sta?s?cs	
  
Probability propagation in CLG
networks: Importance sampling
CAEPIA 2015 Albacete, November 9, 2015 37
Response	
  for	
  an	
  input	
  stream	
  with	
  a	
  network	
  of	
  500	
  variables	
  
Probability propagation in CLG
networks: Importance sampling
CAEPIA 2015 Albacete, November 9, 2015 38
Response	
  for	
  an	
  input	
  stream	
  with	
  a	
  network	
  of	
  10	
  variables	
  
MAP in CLG networks
CAEPIA 2015 Albacete, November 9, 2015 39
	
  
MAP	
  is	
  similar	
  to	
  probability	
  propaga?on	
  but:	
  
	
  
•  First	
  marginalize	
  out	
  by	
  sum/integral	
  (sum	
  phase)	
  
•  Then	
  maximize	
  (max	
  phase)	
  
	
  
	
  
Constrained	
  order	
  -­‐>	
  higher	
  complexity	
  
MAP in CLG networks
CAEPIA 2015 Albacete, November 9, 2015 40
MAP	
  in	
  the	
  AMIDST	
  Toolbox	
  
	
  
•  Hill	
  Climbing	
  (global	
  and	
  local	
  change)	
  
•  Simulated	
  Annealing	
  (global	
  and	
  local	
  change)	
  
•  Sampling	
  
	
  
MAP in CLG networks
CAEPIA 2015 Albacete, November 9, 2015 41
Stream	
  
S.	
  Dist	
   Map	
   Reduce	
  
Stream	
  
C.U.	
  C.U.	
  
C.U.	
  C.U.	
  
Mul?ple	
  
star?ng	
  
points	
  
Local	
  
solu?ons	
  
Inference in dynamic networks
CAEPIA 2015 Albacete, November 9, 2015 42
Task 3.3. Inference in dynamic networks
Inference in DBNs faces the problem of entanglement:
All variables used to encode the belief state at time t = 2 become
dependent after observing {e0, e1, e2}.
AMIDST, Review, Luxembourg, January 22, 2015 16
Inference in dynamic networks
CAEPIA 2015 Albacete, November 9, 2015 43
•  Varia?onal	
  message	
  passing	
  based	
  on	
  the	
  varia?onal	
  
approxima?on	
  to	
  a	
  posterior	
  distribu?on	
  p(xI)	
  which	
  
is	
  defined	
  as	
  	
  
	
  
•  Factored	
  fron?er,	
  which	
  assumes	
  	
  independence	
  of	
  
the	
  nodes	
  connec?ng	
  to	
  the	
  past	
  given	
  the	
  
observa?ons	
  
ference in DBNs will be approached following a Bayesian
rmulation + Variational Bayes.
he variational approximation to a posterior distribution p(xI
defined as
q⇤
(xI ) = arg min
q2Q
D(q(xI )||p(xI )),
here D(q||p) is the KL divergence from q to p.
n alternative is to focus on D(p(xI )||q(xI )), which
orresponds to expectation propagation.
he optimal variational distribution is computed iteratively.
Learning CLG networks from
data
§  Learning	
  the	
  structure	
  
§  Methods	
  based	
  on	
  condi?onal	
  independence	
  tests	
  
§  Score	
  based	
  techniques	
  
	
  
§  Es+ma+ng	
  the	
  parameters	
  
§  Bayesian	
  approach	
  
§  Frequen?st	
  approach	
  (maximum	
  likelihood)	
  
	
  
	
  
	
  
	
   CAEPIA 2015 Albacete, November 9, 2015 44
Learning CLG networks from
data
	
  
§  Bayesian	
  parameter	
  learning	
  
§  Parameters	
  are	
  considered	
  random	
  variables	
  rather	
  than	
  fixed	
  
quan??es	
  
§  A	
  prior	
  distribu?on	
  is	
  assigned	
  to	
  the	
  parameters,	
  represen?ng	
  
the	
  state	
  of	
  knowledge	
  before	
  observing	
  the	
  data	
  
§  The	
  prior	
  is	
  updated	
  in	
  the	
  light	
  of	
  new	
  data.	
  
§  The	
  Bayesian	
  framework	
  naturally	
  deals	
  with	
  data	
  streams	
  
	
  
	
  
	
  
	
  
p(✓|d1, . . . , dn, dn+1) / p(dn+1|✓)p(✓|d1, . . . , dn)
CAEPIA 2015 Albacete, November 9, 2015 45
Learning CLG networks from
data
CAEPIA 2015 Albacete, November 9, 2015 46
AMIDST, 1st Annual Meeting, Copenhagen, November 27-28, 2014
Parameter learning by inference
Simple example:
I Random walk over Y1, Y2, . . .
I f (yt|yt 1) ⇠ N(yt 1, ⌧ 1
).
I Precision ⌧ is unknown.
Y1 Y2 Y3 Y4 Y5
AMIDST, Review, Luxembourg, January 22, 2015 12
Learning CLG networks from
data
CAEPIA 2015 Albacete, November 9, 2015 47
AMIDST, 1st Annual Meeting, Copenhagen, November 27-28, 2014
Parameter learning by inference
Simple example:
I Random walk over Y1, Y2, . . .
I f (yt|yt 1) ⇠ N(yt 1, 1/⌧).
I Precision ⌧ is unknown.
Y1 Y2 Y3 Y4 Y5
⌧
↵
The Bayesian solution:
I Model unknown parameters as random variables.
I Use Bayes formula with “clever” distribution families:
f (⌧|y1:T , a, b) =
f (⌧|a, b)
QT
t=1 f (yt|yt 1, ⌧)
f (y1:T |a, b)
.
Efficient inference leads to efficient learning!
AMIDST, Review, Luxembourg, January 22, 2015 13
Modeling concept drift with
DBNs
CAEPIA 2015 Albacete, November 9, 2015 48
Modeling concept drift with
DBNs
CAEPIA 2015 Albacete, November 9, 2015 49
Exploratory	
  analysis	
  
Part	
  III	
  
	
  
CAEPIA 2015 Albacete, November 9, 2015 50
Exploratory analysis
§  Exploratory	
  analysis	
  helps	
  us	
  in	
  tes?ng	
  model	
  assump?ons	
  
§  It	
  also	
  improves	
  the	
  modeler's	
  knowledge	
  about	
  the	
  problem	
  
and	
  its	
  nature	
  
§  Dynamic	
  Bayesian	
  networks	
  aim	
  at	
  modeling	
  complex	
  ?me	
  
correla?ons	
  
CAEPIA 2015 Albacete, November 9, 2015 51
Sample correlogram
§  Let	
  x1,...,xT be	
  a	
  univariate	
  ?me	
  series.	
  The	
  sample	
  
autocorrela?on	
  coefficient	
  at	
  lag	
  v	
  is	
  given	
  by	
  
§  It	
  represents	
  Pearson’s	
  correla?on	
  coefficient	
  between	
  series	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  and	
  
CAEPIA 2015 Albacete, November 9, 2015 52
ich may strongly limit the extent of the extracted conclusions. How
tations, this analysis will give us some interesting insights which us
ted from experts, as we will see below for the di↵erent use cases.
correlograms: Let x1, ..., xT be a univariate time series. The sa
lation coe cient at lag v is given by
ˆ⇢v =
PT v
t=1 (xt ¯x)(xt+v ¯x)
PT
t=1(xt ¯x)2
s the sample mean and T is the total length of the considered data.
v versus v, for v = 1, . . . , M for some maximum M is called the sa
am of the data. ˆpv corresponds to the Pearson correlation between
}t2{1,...,T} and {xt+v}t+v2{1,...,T}.
orrelograms can be interpreted as a way to measure the strength o
unconditional dependences: Xt 6? Xt+v for some lag v 1. When
ero, this indicates that there exists a strong unconditional independ
Sample correlograms: Let x1, ..., xT be a univariate time serie
autocorrelation coe cient at lag v is given by
ˆ⇢v =
PT v
t=1 (xt ¯x)(xt+v ¯x)
PT
t=1(xt ¯x)2
where ¯x is the sample mean and T is the total length of the conside
plot of ˆpv versus v, for v = 1, . . . , M for some maximum M is cal
correlogram of the data. ˆpv corresponds to the Pearson correlatio
series {xt}t2{1,...,T} and {xt+v}t+v2{1,...,T}.
Sample correlograms can be interpreted as a way to measure the s
following unconditional dependences: Xt 6? Xt+v for some lag v
close to zero, this indicates that there exists a strong unconditiona
between Xt and Xt+v. However, when ˆ⇢v is close to either 1 or 1
not be elicited from experts, as we will see below for the di↵erent use ca
• Sample correlograms: Let x1, ..., xT be a univariate time series.
autocorrelation coe cient at lag v is given by
ˆ⇢v =
PT v
t=1 (xt ¯x)(xt+v ¯x)
PT
t=1(xt ¯x)2
where ¯x is the sample mean and T is the total length of the considere
plot of ˆpv versus v, for v = 1, . . . , M for some maximum M is called
correlogram of the data. ˆpv corresponds to the Pearson correlation
series {xt}t2{1,...,T} and {xt+v}t+v2{1,...,T}.
Sample correlograms can be interpreted as a way to measure the str
following unconditional dependences: Xt 6? Xt+v for some lag v 1.
close to zero, this indicates that there exists a strong unconditional in
between Xt and Xt+v. However, when ˆ⇢v is close to either 1 or 1, th
The	
  sample	
  correlogram	
  is	
  the	
  plot	
  of	
  the	
  sample	
  
autocorrela?on	
  vs.	
  v	
  	
  
Sample correlogram for
independent data
CAEPIA 2015 Albacete, November 9, 2015 53
the Markov chain generating the time data
expressed for the sample correlogram.
(a) Correlogram for i.i.d. data (b)
Sample correlogram for time
correlated data
CAEPIA 2015 Albacete, November 9, 2015 54
gram.
Sample partial correlogram
CAEPIA 2015 Albacete, November 9, 2015 55
§  Consider	
  the	
  regression	
  model	
  
§  Let	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  denote	
  the	
  residuals	
  
§  The	
  sample	
  par?al	
  auto-­‐correla?on	
  coefficient	
  of	
  lag	
  v	
  is	
  the	
  
standard	
  sample	
  auto-­‐correla?on	
  between	
  the	
  series	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
{xt−v}t−v∈{1,...,T} and {et,v}t∈{1,...,T}
§  It can be seen as the correlation between Xt and Xt−v
after having removed the common linear effect of the
data in between.
al relationship, or more intuitively, the “memory” of the time seri
s example, this “memory” inversely depends of the variance of t
✏ value.
e partial correlograms: Let Xt be a random variable associa
values at time t. We can build the following regression problem:
Xt = a0 + a1Xt 1 + a2Xt 2 + ...av 1Xt v+1
tion, let et,v denotes the residuals of this regression problem (i.e.,
stimating Xt using a linear combination of v 1 previous observati
partial auto-correlation coe cient of lag v, denoted as ˆ✓v, is the
auto-correlation between the series {xt v}t v2{1,...,T } and {et,v}
vely, the sample partial auto-correlation coe cient of lag v can b
relation between Xt and Xt v after having removed the comm
f the data in between.
viously, we plot in Figure 3.8 (c) and (d) the sample partial cor
us example, this “memory” inversely depends of the variance
✏ value.
ple partial correlograms: Let Xt be a random variable as
g values at time t. We can build the following regression probl
Xt = a0 + a1Xt 1 + a2Xt 2 + ...av 1Xt v+1
dition, let et,v denotes the residuals of this regression problem
estimating Xt using a linear combination of v 1 previous obse
e partial auto-correlation coe cient of lag v, denoted as ˆ✓v, i
e auto-correlation between the series {xt v}t v2{1,...,T} and
ively, the sample partial auto-correlation coe cient of lag v c
orrelation between Xt and Xt v after having removed the c
of the data in between.
eviously, we plot in Figure 3.8 (c) and (d) the sample partia
Sample partial correlogram for
independent data
CAEPIA 2015 Albacete, November 9, 2015 56
(a) Correlogram for i.i.d. data (b) C
(c) Partial correlogram for i.i.d. data (d) Part
Sample partial correlogram for
time correlated data
CAEPIA 2015 Albacete, November 9, 2015 57
a (b) Correlogram for a time series data
data (d) Partial correlogram for a time series data
Bivariate contour plots
CAEPIA 2015 Albacete, November 9, 2015 58
time series data described above. As it can be seen, the bivariate contou
or time series data shows how Xt and Xt+1 seems to be distributed acco
to a bivariate normal with a covariance matrix that displays a strong deg
correlation. In the case of i.i.d. data, the bivariate contour plot does not
any temporal dependence between Xt and Xt 1.
(a) i.i.d. data (b) Time series data
Figure 3.9: Bivariate contour plots for a set of i.i.d. and time series data.
The R statistical software
CAEPIA 2015 Albacete, November 9, 2015 59
§  R	
  has	
  become	
  a	
  successful	
  tool	
  for	
  data	
  analysis	
  
§  Well known in Statistics, Machine Learning and Data
Science communities
§  “Free software environment for statistical computing and
graphics”
hpp://www.cran.r-­‐project.org	
  
The Rstudio IDE
CAEPIA 2015 Albacete, November 9, 2015 60
hpp://www.rstudio.com	
  
The R statistical software
CAEPIA 2015 Albacete, November 9, 2015 61
•  Exploratory	
  analysis	
  demo	
  using	
  R	
  
•  Latex	
  document	
  genera?on	
  from	
  R	
  
using	
  Sweave	
  
The	
  Ramidst	
  package	
  
Part	
  IV	
  
	
  
CAEPIA 2015 Albacete, November 9, 2015 62
The Ramidst package
CAEPIA 2015 Albacete, November 9, 2015 63
§  The	
  package	
  provides	
  an	
  interface	
  for	
  using	
  the	
  
AMIDST	
  toolbox	
  func?onality	
  from	
  R	
  
§  The interaction is actually carried out through the rJava
package
§  So far Ramidst provides functions for inference in static
networks and concept drift detection using DBNs
§  Extensive extra functionality available thanks to the
HUGIN link
The AMIDST toolbox
CAEPIA 2015 Albacete, November 9, 2015 64
•  Scalable	
  framework	
  for	
  data	
  stream	
  processing.	
  
•  Based	
  on	
  Probabilis?c	
  Graphical	
  Models.	
  
•  Unique	
  FP7	
  project	
  for	
  data	
  stream	
  mining	
  using	
  PGMs.	
  
•  Open	
  source	
  so[ware	
  (Apache	
  So[ware	
  License	
  2.0).	
  
The AMIDST toolbox official
website
CAEPIA 2015 Albacete, November 9, 2015 65
hZp://amidst.github.io/toolbox/	
  
Available for download at GitHub
CAEPIA 2015 Albacete, November 9, 2015 66
§  Download:	
  
	
  :>	
  git	
  clone	
  hZps://github.com/amidst/toolbox.git	
  
§  Compile:	
  
	
  :>	
  ./compile.sh	
  
§  Run:	
  
	
  :>	
  ./run.sh	
  <class-­‐name>	
  
Please give our project a “star”!
CAEPIA 2015 Albacete, November 9, 2015 67
Processing data streams in R
CAEPIA 2015 Albacete, November 9, 2015 68
§  RMOA	
  
§  MOA	
  is	
  a	
  state-­‐of-­‐the-­‐art	
  tool	
  for	
  data	
  stream	
  mining.	
  
§  RMOA	
  provides	
  func?onality	
  for	
  accessing	
  MOA	
  from	
  R	
  
§  Several	
  sta?c	
  models	
  are	
  available	
  
§  They	
  can	
  be	
  learnt	
  from	
  streams	
  
§  Streams	
  can	
  be	
  created	
  from	
  csv	
  files	
  or	
  from	
  different	
  R	
  objects	
  
hZp://moa.cms.waikato.ac.nz	
  
The Ramidst package
CAEPIA 2015 Albacete, November 9, 2015 69
Inference	
  and	
  concept	
  dri[	
  demo	
  
using	
  Ramidst	
  
CAEPIA 2015 Albacete, November 9, 2015 70
This project has received funding from the European Union’s
Seventh Framework Programme for research, technological
development and demonstration under grant agreement no 619209

More Related Content

Similar to Analysis of massive data using R (CAEPIA2015)

Probabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksProbabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksTomaso Aste
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix DatasetBen Mabey
 
A quantum-inspired optimization heuristic for the multiple sequence alignment...
A quantum-inspired optimization heuristic for the multiple sequence alignment...A quantum-inspired optimization heuristic for the multiple sequence alignment...
A quantum-inspired optimization heuristic for the multiple sequence alignment...Konstantinos Giannakis
 
ProbabilisticModeling20080411
ProbabilisticModeling20080411ProbabilisticModeling20080411
ProbabilisticModeling20080411Clay Stanek
 
A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...
A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...
A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...Rafael Nogueras
 
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20Yuta Kashino
 
Ikdd co ds2017presentation_v2
Ikdd co ds2017presentation_v2Ikdd co ds2017presentation_v2
Ikdd co ds2017presentation_v2Ram Mohan
 
Bayesian probabilistic interference
Bayesian probabilistic interferenceBayesian probabilistic interference
Bayesian probabilistic interferencechauhankapil
 
Bayesian probabilistic interference
Bayesian probabilistic interferenceBayesian probabilistic interference
Bayesian probabilistic interferencechauhankapil
 
Parallelisation of the PC Algorithm (CAEPIA2015)
Parallelisation of the PC Algorithm (CAEPIA2015)Parallelisation of the PC Algorithm (CAEPIA2015)
Parallelisation of the PC Algorithm (CAEPIA2015)AMIDST Toolbox
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraJason Riedy
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelstuxette
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstracttsysglobalsolutions
 
Bytewise Approximate Match: Theory, Algorithms and Applications
Bytewise Approximate Match:  Theory, Algorithms and ApplicationsBytewise Approximate Match:  Theory, Algorithms and Applications
Bytewise Approximate Match: Theory, Algorithms and ApplicationsLiwei Ren任力偉
 
Neural Nets Deconstructed
Neural Nets DeconstructedNeural Nets Deconstructed
Neural Nets DeconstructedPaul Sterk
 
Improving Tree augmented Naive Bayes for class probability estimation
Improving Tree augmented Naive Bayes for class probability estimationImproving Tree augmented Naive Bayes for class probability estimation
Improving Tree augmented Naive Bayes for class probability estimationBeat Winehouse
 

Similar to Analysis of massive data using R (CAEPIA2015) (20)

Probabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksProbabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering Networks
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
 
A quantum-inspired optimization heuristic for the multiple sequence alignment...
A quantum-inspired optimization heuristic for the multiple sequence alignment...A quantum-inspired optimization heuristic for the multiple sequence alignment...
A quantum-inspired optimization heuristic for the multiple sequence alignment...
 
ProbabilisticModeling20080411
ProbabilisticModeling20080411ProbabilisticModeling20080411
ProbabilisticModeling20080411
 
A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...
A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...
A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...
 
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
 
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
 
Data mining
Data mining Data mining
Data mining
 
Ikdd co ds2017presentation_v2
Ikdd co ds2017presentation_v2Ikdd co ds2017presentation_v2
Ikdd co ds2017presentation_v2
 
Bayesian probabilistic interference
Bayesian probabilistic interferenceBayesian probabilistic interference
Bayesian probabilistic interference
 
Bayesian probabilistic interference
Bayesian probabilistic interferenceBayesian probabilistic interference
Bayesian probabilistic interference
 
GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...
GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...
GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...
 
Parallelisation of the PC Algorithm (CAEPIA2015)
Parallelisation of the PC Algorithm (CAEPIA2015)Parallelisation of the PC Algorithm (CAEPIA2015)
Parallelisation of the PC Algorithm (CAEPIA2015)
 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstract
 
Bytewise Approximate Match: Theory, Algorithms and Applications
Bytewise Approximate Match:  Theory, Algorithms and ApplicationsBytewise Approximate Match:  Theory, Algorithms and Applications
Bytewise Approximate Match: Theory, Algorithms and Applications
 
Neural Nets Deconstructed
Neural Nets DeconstructedNeural Nets Deconstructed
Neural Nets Deconstructed
 
Improving Tree augmented Naive Bayes for class probability estimation
Improving Tree augmented Naive Bayes for class probability estimationImproving Tree augmented Naive Bayes for class probability estimation
Improving Tree augmented Naive Bayes for class probability estimation
 

More from AMIDST Toolbox

Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)
Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)
Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)AMIDST Toolbox
 
A Java Toolbox for Analysis of MassIve Data STreams using Probabilistic Graph...
A Java Toolbox for Analysis of MassIve Data STreams using Probabilistic Graph...A Java Toolbox for Analysis of MassIve Data STreams using Probabilistic Graph...
A Java Toolbox for Analysis of MassIve Data STreams using Probabilistic Graph...AMIDST Toolbox
 
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...AMIDST Toolbox
 
Amidst demo (BNAIC 2015)
Amidst demo (BNAIC 2015)Amidst demo (BNAIC 2015)
Amidst demo (BNAIC 2015)AMIDST Toolbox
 
d-VMP: Distributed Variational Message Passing (PGM2016)
d-VMP: Distributed Variational Message Passing (PGM2016)d-VMP: Distributed Variational Message Passing (PGM2016)
d-VMP: Distributed Variational Message Passing (PGM2016)AMIDST Toolbox
 
Scalable MAP inference in Bayesian networks based on a Map-Reduce approach (P...
Scalable MAP inference in Bayesian networks based on a Map-Reduce approach (P...Scalable MAP inference in Bayesian networks based on a Map-Reduce approach (P...
Scalable MAP inference in Bayesian networks based on a Map-Reduce approach (P...AMIDST Toolbox
 

More from AMIDST Toolbox (7)

Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)
Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)
Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)
 
A Java Toolbox for Analysis of MassIve Data STreams using Probabilistic Graph...
A Java Toolbox for Analysis of MassIve Data STreams using Probabilistic Graph...A Java Toolbox for Analysis of MassIve Data STreams using Probabilistic Graph...
A Java Toolbox for Analysis of MassIve Data STreams using Probabilistic Graph...
 
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
 
Amidst demo (BNAIC 2015)
Amidst demo (BNAIC 2015)Amidst demo (BNAIC 2015)
Amidst demo (BNAIC 2015)
 
d-VMP: Distributed Variational Message Passing (PGM2016)
d-VMP: Distributed Variational Message Passing (PGM2016)d-VMP: Distributed Variational Message Passing (PGM2016)
d-VMP: Distributed Variational Message Passing (PGM2016)
 
Scalable MAP inference in Bayesian networks based on a Map-Reduce approach (P...
Scalable MAP inference in Bayesian networks based on a Map-Reduce approach (P...Scalable MAP inference in Bayesian networks based on a Map-Reduce approach (P...
Scalable MAP inference in Bayesian networks based on a Map-Reduce approach (P...
 
Flink Forward 2016
Flink Forward 2016Flink Forward 2016
Flink Forward 2016
 

Recently uploaded

THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)itwameryclare
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxSimeonChristian
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 

Recently uploaded (20)

THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 

Analysis of massive data using R (CAEPIA2015)

  • 1. 1CAEPIA 2015 Albacete, November 9, 2015 Analysis of Massive Data Streams Using R Antonio  Salmerón1,  Helge  Langseth2    Anders  L.  Madsen3,4,  Thomas  D.  Nielsen4     1Dept.  Mathema?cs,  University  of  Almería,  Spain   2Dept.  Computer  and  Informa?on  Science.  Norwegian  University  of  Science  and   Technology,  Trondheim,  Norway   3Hugin  Expert  A/S,  Aalborg,  Denmark   4Dept.  Computer  Science,  Aalborg  University,  Denmark    
  • 2. Outline 1.  Introduc+on   o  Data  streams   o  Challenges  when  processing  data  streams   o  Why  Bayesian  networks?   o  The  AMIDST  project   2.  Bayesian  networks   o  Sta?c  and  dynamic  models   o  Inference  and  learning   3.  Exploratory  analysis   o  Exploratory  ?me  series  analysis  in  R   o  Report  genera?on:  LaTeX  +  R   4.  The  Ramidst  package   o  The  AMIDST  toolbox   o  Using  the  AMIDST  toolbox  from  R   CAEPIA 2015 Albacete, November 9, 2015 2
  • 3. Introduc?on   Part  I     CAEPIA 2015 Albacete, November 9, 2015 3
  • 4. Data Streams everywhere •  Unbounded  flows  of  data  are  generated  daily:     •  Social  Networks   •  Network  Monitoring   •  Financial/Banking  industry   •  ….     CAEPIA 2015 Albacete, November 9, 2015 4
  • 5. Data Stream Processing •  Processing  data  streams  is  challenging:   –  They  do  not  fit  in  main  memory   –  Con?nuous  model  upda?ng     –  Con?nuous  inference  /  predic?on   –  Concept  dri[     CAEPIA 2015 Albacete, November 9, 2015 5
  • 6. Processing Massive Data Streams • Scalability  is  a  main  issue:   •  Scalable  compu?ng  infrastructure   •  Scalable  models  and  algorithms     CAEPIA 2015 Albacete, November 9, 2015 6
  • 7. Why Bayesian networks? §  Example:     §  Stream  of  sensor  measurements  about  temperature  and   smoke  presence  in  a  given  geographical  area.   §  The  stream  is  analysed  to  detect  the  presence  of  fire  (event   detec?on  problem)     ?   CAEPIA 2015 Albacete, November 9, 2015 7
  • 8. §  The  problem  can  be  approached  as  an  anomaly   detec+on  task  (outliers)   §  A  commonly  used  method  is  Streaming  K-­‐Means   Why Bayesian networks? Anomaly   CAEPIA 2015 Albacete, November 9, 2015 8
  • 9. Why Bayesian networks? §  OJen,  data  streams  are  handled  using  black-­‐box  models:   §  Pros:   §  No  need  to  understand  the  problem     §  Cons:   §  Hyper-­‐parameters  to  be  tuned   §  Black-­‐box  models  can  seldom  explain  away   Stream   Black-­‐box  Model   Predic?ons   CAEPIA 2015 Albacete, November 9, 2015 9
  • 10. §  Bayesian  Networks:   §  Open-­‐box  models   §  Encode  prior  knowledge.   §  Con?nuous  and  discrete  variables  (CLG  networks).     §  Example:       Why Bayesian networks? Fire   Temp   Smoke   T1   T2   T3   S1    p(Fire=true|t1,t2,t3,s1)   CAEPIA 2015 Albacete, November 9, 2015 10
  • 11. Why Bayesian networks? Stream   Predic+ons   Open-­‐box  Models   CAEPIA 2015 Albacete, November 9, 2015 11
  • 12. Why Bayesian networks? Stream   Predic+ons   Open-­‐box  Models   Black-­‐box  Inference  Engine                (mul+-­‐core  paralleliza+on)     CAEPIA 2015 Albacete, November 9, 2015 12
  • 13. The AMIDST project §  FP7-­‐funded  EU  project   §  Large  number  of  variables   §  Data  arriving  in  streams   §  Based  on  hybrid  Bayesian  networks   §  Open  source  toolbox  with  learning  and  inference  capabili?es   §  Two  use  cases  provided  by  industrial  partners   §  Predic+on  of  maneuvers  in  highway  traffic  (Daimler)   §  Risk  predic+on  in  credit  opera+ons  and  customer  profiling  (BCC)   §  hZp://www.amidst.eu           CAEPIA 2015 Albacete, November 9, 2015 13 ODELADO CON REDES BAYESIANAS DINÁMICAS HÍBRIDAS RESULTADOS OBTENIDOS EN LA PREDICCIÓN DE MANIOBRAS DE TRÁF REDES BAYESIANAS DINÁMICAS DE 2 ETAPAS TEMPORALES S DE MARKOV AYESIANAS s entre variables vienen dadas dirigido. Se conocen las s condicionales de probabilidad, ores de los padres. SIANA DINÁMICA PARA LA EVIDENCIA LATERAL EN UN VEHÍCULO • Es preferible analizar l tendencia a fijar un valo en la probabilidad. • Con ello, es posible prede maniobras con mayor ante usando otros métodos. • Las redes Bayesianas diná mediante el uso de algorit inferencia aproximados, so herramienta adecuada par dificultades de este proble • El paquete AMIDST perm análisis de datos en tiempo mediante el uso de redes B dinámicas, proporcionando adecuado para intentar res problema. • Se espera que estas y otr contribuciones reduzcan e víctimas de accidentes de buscando el objetivo de co vehículo totalmente segur
  • 14. Bayesian  networks   Part  II     CAEPIA 2015 Albacete, November 9, 2015 14
  • 15. Definition §  Formally,  a  Bayesian  network  consists  of   §  A  directed  acyclic  graph  (DAG)  where  each  node  is  a  random   variable   §  A  set  of  condi?onal  probability  distribu?ons,  one  for  each   variable  condi?onal  on  its  parents  in  the  DAG     §  For  a    set  of  variables                                                                                ,  the  joint   distribu+on  factorizes  as       §  The  factoriza?on  allows  local  computa?ons     CAEPIA 2015 Albacete, November 9, 2015 15 CT 619209 / AMIDST Page 8 of 63 mally, let X = {X1, . . . , XN } denote the set of stochastic random variables d domain problem. A BN defines a joint distribution P(X) in the following for p(X) = NY i=1 p(Xi|Pa(Xi)) e Pa(Xi) ⇢ XXi represents the so-called parent variables of Xi. Bayesian ne be graphically represented by a directed acyclic graph (DAG). Each node, la n the graph, is associated with a factor or conditional probability p(Xi|Pa FP7-ICT 619209 / AMIDST Page 8 of 63 Formally, let X = {X1, . . . , XN } denote the set o our domain problem. A BN defines a joint distrib p(X) = NY i=1 p(Xi|P where Pa(Xi) ⇢ XXi represents the so-called pa
  • 16. Reading independencies Independence  rela+ons  can  be  read  off  from  the   structure     There  are  three  types  of  connec?ons:     §  Serial     §  Diverging     §  Converging     CAEPIA 2015 Albacete, November 9, 2015 16 Tipos de conexiones Conexi´on en serie: A B C Conexi´on divergente: A B C Conexi´on convergente: A B C Tipos de conexiones Conexi´on en serie: A B C Conexi´on divergente: A B C Conexi´on convergente: A B C Tipos de conexiones Conexi´on en serie: A B C Conexi´on divergente: A B C Conexi´on convergente: A B C
  • 17.   Reading independencies. Example Fire   Temp   Smoke   T1   T2   T3   S1   CAEPIA 2015 Albacete, November 9, 2015 17 •  Knowing  the  temperature  with  certainty  makes  the  temperature  sensor  readings   and  the  event  of  fire  independent   •  The  smoke  sensor  reading  is  also  irrelevant  to  the  event  of  fire  if  Smoke  is  known  for   sure  
  • 18.   Reading independencies. Example Fire   Temp   Smoke   T1   T2   T3   S1   CAEPIA 2015 Albacete, November 9, 2015 18 •  Knowing  the  temperature  with  certainty  makes  the  temperature  sensor  readings   and  the  event  of  fire  independent   •  The  smoke  sensor  reading  is  also  irrelevant  to  the  event  of  fire  if  Smoke  is  known  for   sure   •  If  there  is  no  info  about  Temp  or  sensor  readings,  Sun  and  Fire  are  independent   Sun  
  • 19. Hybrid Bayesian networks CAEPIA 2015 Albacete, November 9, 2015 19 •  In  a  hybrid  Bayesian  network,  discrete  and  con?nuous  variables  coexist   •  Mixtures  of  truncated  basis  func?ons  (MoTBFs)  have  been  successfully  used   in  this  context  (Langseth  et  al.  2012)   •  Mixtures  of  truncated  exponen?als  (MTEs)   •  Mixtures  of  polynomials  (MoPs)   •  MoTBFs  support  efficient  inference  and  learning  in  a  sta?c  seeng   •  Learning  from  streams  is  more  problema?c   •  The  reason  is  that  they  do  not  belong  to  the  exponen?al  family  
  • 20. The exponential family CAEPIA 2015 Albacete, November 9, 2015 20 •  A  family  of  probability  func?ons  belongs  to  the  k  parametric   exponen?al  family  if  it  can  be  expressed  as   uación 2.14 puede expresarse, de forma equivalente, como f(x; θ) = H(x)C(θ) exp{Q(θ)T(x)} ón 2.11 La familia de funciones de densidad o de masa de probabilid θ ∈ Θ ⊆ Rk} pertenece a la familia exponencial k-paramétrica si f(x; θ) = exp k i=1 Qi(θ)Ti(x) + D(θ) + S(x) e se considera como el soporte de una distribución el conjunto {x ∈ X | f(x; θ) > 0} au tribuciones de tipo continuo tal definición podría no ser adecuada ya que podríamos re densidad en una cantidad numerable de puntos sin cambiar la distribución por lo que e o estaría definido de forma única. Una definición más precisa es considerar que x ∈ X p P{x − h < X < x + h} > 0 para cualquier h > 0. melo Rodríguez Torreblanca adística y Mat. Aplicada. UAL •  The  Ti  func?ons  are  the  sufficient  sta?s?cs  for  the  unknown   parameters,  i.e.,  they  contain  all  the  informa?on  in  the  sample   that  is  relevant  for  es?ma?ng  the  parameters   •  They  have  dimension  1   •  We  can  “compress”  all  the  informa?on  in  the  stream  so  far  as  a   single  number  for  each  parameter  
  • 21. Hybrid Bayesian networks. CLGs CAEPIA 2015 Albacete, November 9, 2015 21 Conditional Linear Gaussian networks A Conditional Linear Gaussian (CLG) network is a hybrid Bayesian network where I The conditional distribution of each discrete variable XD given its parents is a multinomial I The conditional distribution of each continuous variable Z with discrete parents XD and continuous parents XC , is p(z|XD = xD, XC = xC ) = N(z; ↵(xD) + (xD)T xC , (xD)) for all xD and xC , where ↵ and are the coefficients of a linear regression model of Z given XC , potentially different for each configuration of XD. ECSQARU 2015, Compiegne, July 17, 2015 4 CLGs  belong  to  the  exponen?al  family  
  • 22. CLGs: Example CAEPIA 2015 Albacete, November 9, 2015 22 Y W TU S P(Y ) = (0.5, 0.5) P(S) = (0.1, 0.9) f (w|Y = 0) = N(w; 1, 1) f (w|Y = 1) = N(w; 2, 1) f (t|w, S = 0) = N(t; w, 1) f (t|w, S = 1) = N(t; w, 1) f (u|w) = N(u; w, 1) ECSQARU 2015, Compiegne, July 17, 2015 5 Conditional Linear Gaussian networks. Example Y W TU S P(Y ) = (0.5, 0.5) P(S) = (0.1, 0.9) f (w|Y = 0) = N(w; 1, 1) f (w|Y = 1) = N(w; 2, 1) f (t|w, S = 0) = N(t; w, 1) f (t|w, S = 1) = N(t; w, 1) f (u|w) = N(u; w, 1) ECSQARU 2015, Compiegne, July 17, 2015 5
  • 23. CLGs: Example CAEPIA 2015 Albacete, November 9, 2015 23 Y W TU S P(Y ) = (0.5, 0.5) P(S) = (0.1, 0.9) f (w|Y = 0) = N(w; 1, 1) f (w|Y = 1) = N(w; 2, 1) f (t|w, S = 0) = N(t; w, 1) f (t|w, S = 1) = N(t; w, 1) f (u|w) = N(u; w, 1) ECSQARU 2015, Compiegne, July 17, 2015 5 Conditional Linear Gaussian networks. Example Y W TU S P(Y ) = (0.5, 0.5) P(S) = (0.1, 0.9) f (w|Y = 0) = N(w; 1, 1) f (w|Y = 1) = N(w; 2, 1) f (t|w, S = 0) = N(t; w, 1) f (t|w, S = 1) = N(t; w, 1) f (u|w) = N(u; w, 1) ECSQARU 2015, Compiegne, July 17, 2015 5 §  Limita+on:  discrete    nodes  are  not  allowed  to  have   con?nuous  parents   §  This  is  not  a  big  problem  for  Bayesian  classifiers      
  • 24. Bayesian network classifiers §  The  structure  is  usually  restricted   §  There  is  a  dis?nguished  (discrete)  variable  called  the   class  while  the  rest  are  called  features   §  Examples:         CAEPIA 2015 Albacete, November 9, 2015 24 C X2X1 ... Xn (a) C X2X1 X3 X4 (b) Figure 1: Structure of naive Bayes (a) and TAN (b) classifiers. In general, there are several possible TAN structures for a given set of138 variables. The way to choose among them is to construct a maximum weight139 spanning tree containing the features, where the weight of each edge is the140 Naive  Bayes   Tree-­‐augmented  network  (TAN)  
  • 25. Bayesian network classifiers §  The  class  value  is  determined  as         §  In  the  case  of  Naïve  Bayes,     CAEPIA 2015 Albacete, November 9, 2015 25 ayesian network can be used as a classifier if it contains a cla a set of continuous or discrete explanatory variables X1, . . . , ect with observed features x1, . . . , xn will be classified as be 2 ⌦C obtained as c⇤ = arg max c2⌦C p(c|x1, . . . , xn), ⌦C denotes the set of all posible values of C. nsidering that p(c|x1, . . . , xn) is proportional to p(c) ⇥ p(x1, cification of an n dimensional distribution for X1, . . . , Xn d in order to solve the classification problem, which implies a mputational cost, as the number of parameters necessary to stribution is exponential in the number of variables, in the w er, this problem is simplified if we take advantage of the fac d by the BN. Since building a network without any structur not always feasible (they might be as complex as the above m ded by the BN. Since building a network without any structur is not always feasible (they might be as complex as the above m distribution), networks with fixed or restricted and simple utilized instead when facing classification tasks. The extreme c e Bayes (NB) structure, where all the feature variables are c pendent given C, as depicted in Fig. 1(a). The strong assu pendence behind NB models is somehow compensated by the e number of parameters to be estimated from data, since in th s that p(c|x1, . . . , xn) / p(c) nY i=1 p(xi|c) , h means that, instead of one n-dimensional conditional densi nsional conditional densities must be estimated. n TAN models, more dependencies are allowed, expanding the
  • 26. Reasoning over time: Dynamic Bayesian networks §  Temporal  reasoning  can  be  accommodated  within  BNs   §  Variables  are  indexed  over  ?me,  giving  rise  to  dynamic   Bayesian  networks   §  We  have  to  model  the  joint  distribu?on  over  ?me   §  Dynamic  BNs  reduce  the  factoriza?on  complexity  by   adop?ng  the  Markov  assump?on       CAEPIA 2015 Albacete, November 9, 2015 26 Similarly to static BNs, we model our problem/system using a set of stochastic ran variables, denoted Xt, with the main di↵erence that variables are indexed here discrete time index t. In this way, we explicitly model the state of the system a given time. Moreover, we always assume that the system is described at a fixed frequ and use Xa:b ⌘ Xa, Xa+1, . . . , Xb to denote the set of variables between two time p a and b. For reasoning over time, we need to model the joint probability p(X1:T ) which ha following natural cascade decomposition: p(X1:T ) = TY t=1 p(Xt|X1:t 1), where p(Xt|X1:t 1) is equal to p(X1) for t = 1. As t increases, the conditional ability p(Xt|X1:t 1) becomes intractable. Similarly to static BNs, dynamic BNs more compact factorization of the above joint probability. The first kind of condit independence assumption encoded by DBNs to reduce the factorization complex the well-known Markov assumption. Under this assumption, the current state is pendent from the previous one given a finite number of previous steps and the resu models are referred to as Markov chains. Basically, a Markov chain can be defin either discrete or continuous variables X1:T . It exploits the following equality: lowing natural cascade decomposition: p(X1:T ) = TY t=1 p(Xt|X1:t 1), here p(Xt|X1:t 1) is equal to p(X1) for t = 1. As t increases, the conditio ility p(Xt|X1:t 1) becomes intractable. Similarly to static BNs, dynamic B ore compact factorization of the above joint probability. The first kind of c dependence assumption encoded by DBNs to reduce the factorization com e well-known Markov assumption. Under this assumption, the current sta ndent from the previous one given a finite number of previous steps and the odels are referred to as Markov chains. Basically, a Markov chain can be d her discrete or continuous variables X1:T . It exploits the following equality p(Xt|X1:t 1) = p(Xt|Xt V :t 1) here V 1 is the order of the Markov chain. Figure 3.3 shows two example rresponding to first-order (i.e., V = 1) and third-order (i.e., V = 3) Markov
  • 27. Reasoning over time: Dynamic Bayesian networks §  DBN  assuming  third  order  Markov  assump?on   §  DBN  assuming  first  order  Markov  assump?on               CAEPIA 2015 Albacete, November 9, 2015 27 9209 / AMIDST Page 11 of 63 Publi .3: An example of DBNs assuming a third-order (above) and a first-orde Markov property. an unrealistic assumption in some problems leading to poor approximations o distribution. One could increase the Markov order to improve the approxima 9209 / AMIDST Page 11 of 63 Publi .3: An example of DBNs assuming a third-order (above) and a first-order Markov property.
  • 28. Particular cases of Dynamic Bayesian networks §  Hidden  Markov  models   §  The  joint  distribu?on  of  the  hidden  (X)  and  observed  (Y)   variables  is             CAEPIA 2015 Albacete, November 9, 2015 28 FP7-ICT 619209 / AMIDST Page 12 of 63 Publi Figure 3.4: An example of a BN structure corresponding to a HMM. P(X1:T , Y1:T ) = tY t=1 P(Xt|Xt 1)P(Yt|Xt). (3.1) Although most of our models will fit into this description of observed and hidden (state) variables, there will be cases in which the transition model takes place in the observed CT 619209 / AMIDST Page 12 of 63 Figure 3.4: An example of a BN structure corresponding to a HMM. P(X1:T , Y1:T ) = tY t=1 P(Xt|Xt 1)P(Yt|Xt). hough most of our models will fit into this description of observed and hidden (s ables, there will be cases in which the transition model takes place in the obs
  • 29. Particular cases of Dynamic Bayesian networks §  Input-­‐output  Hidden  Markov  models     §  Linear  dynamic  systems:  switching  Kalman  filter       CAEPIA 2015 Albacete, November 9, 2015 29 variables (see, e.g., the case of Cajamar), which in general simplifies the learning- inference processes of the problem. An extension of the HMM is the so-called input-output hidden Markov model (IOHMM) shown in Figure 3.5. IOHMM incorporates an extra top layer of input variables Y0 1:T , which can be either continuous or discrete. The existing HMM layer of observed vari- ables, Y1:T , is referred to as the output set of variables. Figure 3.5: An example of a BN structure corresponding to an IO-HMM. IOHMM is usually employed in supervised classification problems. In this case, both input and output variables are known during training, but only the former is known during testing. In fact, during testing, inference is performed to predict the output variables at each time step. In AMIDST we use this model in a di↵erent way. In our case, both set of input and output variables are always known, so that inference is only performed to predict the latent variables. The input variables Y0 1:T are introduced as a way to “relax” the stationary assumption, by explicitly introducing a dependency to some observed information at each time slice, that is, the transition probability between Similar to the extension of the static BN model to hybrid domains, DBNs have likewise been extended to continuous and hybrid domains. In purely continuous domains, where the continuous variables follow linear Gaussian distributions, the DBN corresponds to (a factorized version of) a Kalman filter (KF). The structure of a KF is exactly the same as the one displayed in Figure 3.4 for the HMM, however with the restriction that all variables should be continuous. In this case, the state variables can be a combination of continuous variables with di↵erent dependences, and where the dynamics of the process are assumed to be linear. When modelling non-linear domains, the dynamics and observational distributions are often approximated through, e.g., the extended Kalman filter, which models the system as locally linear in the mean of the current state distribution. Another type of model ensuring non-linear predictions with a more expressive representation is the switching Kalman filter (SKF). The type of SKF that we are going to consider here includes an extra discrete state variable that is able to use a weighted combination of the linear sub-models. That is, the discrete state variable assigns a probability to each linear term in the mixture, hence, representing the belief state as a mixture of Gaussians. In this way, it can deal, to some extent, with violations of both the assumption of linearity and Gaussian noise. Figure 3.6 depicts the graphical structure of this dynamic model. Figure 3.6: An example of a switching Kalman filter. Zt represents the discrete state
  • 30. Two-time slice Dynamic Bayesian networks (2T-DBN) §  They  conform  the  main  dynamic  model  in  AMIDST     §  The  transi+on  distribu+on  is   CAEPIA 2015 Albacete, November 9, 2015 30 Figure 3.7: An example of a BN structure corresponding to a 2T-DBN. T-DBN, the transition distribution is represented as follows: p(Xt+1|Xt) = Y Xt+12Xt+1 p(Xt+1|Pa(Xt+1)), Pa(Xt+1) refers to the set of parents of the variable Xt+1 in the transition m In general, DBNs can model arbitrary distributions over time. However, in AMIDST, we will especially focus on the so-called two-time slice DBNs (2T-DBNs). 2T-DBNs are characterised by an initial model representing the initial joint distribution of the process and a transition model representing a standard BN repeated over time. This kind of DBN model satisfies both the first-order Markov assumption and the stationarity assumption. Figure 3.7 shows an example of a graphical structure of a 2T-DBN model. Figure 3.7: An example of a BN structure corresponding to a 2T-DBN. In a 2T-DBN, the transition distribution is represented as follows: p(Xt+1|Xt) = Y Xt+12Xt+1 p(Xt+1|Pa(Xt+1)), where Pa(Xt+1) refers to the set of parents of the variable Xt+1 in the transition model,
  • 31. Inference in CLG networks §  There  are  three  ways  of  querying  a  BN   §  Belief  upda?ng  (probability  propaga?on)   §  Maximum  a  posteriori  (MAP)   §  Most  probable  explana?on  (MPE)     CAEPIA 2015 Albacete, November 9, 2015 31
  • 32. Inference in CLG networks CAEPIA 2015 Albacete, November 9, 2015 32 Querying a Bayesian network (I) I Probabilistic inference: Computing the posterior distribution of a target variable: p(xi |xE ) = X xD Z xC p(x, xE )dxC X xDi Z xCi p(x, xE )dxCi
  • 33. Inference in CLG networks CAEPIA 2015 Albacete, November 9, 2015 33 Querying a Bayesian network (II) I Maximum a posteriori (MAP): For a set of target variables XI , the goal is to compute x⇤ I = arg max xI p(xI |XE = xE ) where p(xI |XE = xE ) is obtained by first marginalizing out from p(x) the variables not in XI and not in XE I Most probable explanation (MPE): A particular case of MAP where XI includes all the unobserved variables ECSQARU 2015, Compiegne, July 17, 2015 8
  • 34. Probability propagation in CLG networks: Importance sampling CAEPIA 2015 Albacete, November 9, 2015 34 •  Let’s  denote  by              the  posterior  probability  for  the  target   variable,  and       •  Then,     Therefore,  we  have  transformed  the  problem  of  probability   propaga?on  into  es?ma?ng  the  expected  value  of  a  random   variable  from  a  sample  drawn  from  a  distribu?on  of  our  own   choice   Scalable approximate inference in CLG networks 5 umerator of Eq. (2), i.e. ✓ = R b a h(xi)dxi with (xi) = X xD2⌦XD Z xC 2⌦XC p(x; xE)dxC. e ✓ as xi)dxi = Z b a h(xi) p⇤(xi) p⇤ (xi)dxi = Ep⇤  h(X⇤ i ) p⇤(X⇤ i ) , (6) ity density function on (a, b) called the sampling distribu- dom variable with density p⇤ . Let X⇤ i (1) , . . . , X⇤ i (m) be a . Then it is easy to prove that ˆ 1 mX h(X⇤ i (j) ) Scalable approximate inference in CLG networks 5 Let ✓ denote the numerator of Eq. (2), i.e. ✓ = R b a h(xi)dxi with h(xi) = X xD2⌦XD Z xC 2⌦XC p(x; xE)dxC. Then, we can write ✓ as ✓ = Z b a h(xi)dxi = Z b a h(xi) p⇤(xi) p⇤ (xi)dxi = Ep⇤  h(X⇤ i ) p⇤(X⇤ i ) , (6) here p⇤ is a probability density function on (a, b) called the sampling distribu- on, and X⇤ i is a random variable with density p⇤ . Let X⇤ i (1) , . . . , X⇤ i (m) be a ample drawn from p⇤ . Then it is easy to prove that ˆ✓1 = 1 m mX j=1 h(X⇤ i (j) ) p⇤(X⇤ i (j) ) (7) Scalable approximate inference in CLG networks Let ✓ denote the numerator of Eq. (2), i.e. ✓ = R b a h(xi)dxi with h(xi) = X xD2⌦XD Z xC 2⌦XC p(x; xE)dxC. Then, we can write ✓ as ✓ = Z b a h(xi)dxi = Z b a h(xi) p⇤(xi) p⇤ (xi)dxi = Ep⇤  h(X⇤ i ) p⇤(X⇤ i ) , where p⇤ is a probability density function on (a, b) called the sampling distr tion, and X⇤ i is a random variable with density p⇤ . Let X⇤ i (1) , . . . , X⇤ i (m) b sample drawn from p⇤ . Then it is easy to prove that ˆ✓1 = 1 m mX j=1 h(X⇤ i (j) ) p⇤(X⇤ i (j) )
  • 35. Probability propagation in CLG networks: Importance sampling CAEPIA 2015 Albacete, November 9, 2015 35 •  The  expected  value  can  be  es?mated  using  a  sample  mean   es?mator.  Let                                                                            be  a  sample  drawn  from   p*.  Then  a  consistent  unbiased  es?mator  of          is  given  by     •  In  AMIDST,  the  sampling  distribu?on  is  formed  by  the   condi?onal  distribu?ons  in  the  network  (Evidence  weigh?ng)   (xi) (xi) p⇤ (xi)dxi = Ep⇤  h(X⇤ i ) p⇤(X⇤ i ) , (6) ction on (a, b) called the sampling distribu- with density p⇤ . Let X⇤ i (1) , . . . , X⇤ i (m) be a sy to prove that mX j=1 h(X⇤ i (j) ) p⇤(X⇤ i (j) ) (7) e estimation is determined by its variance, ⇤(j) ) ⇤ i (j) ) 1 A = 1 m2 mX j=1 Var h(X⇤ i (j) ) p⇤(X⇤ i (j) ) ! Scalable approximate inference in CLG network Let ✓ denote the numerator of Eq. (2), i.e. ✓ = R b a h(xi)dxi with h(xi) = X xD2⌦XD Z xC 2⌦XC p(x; xE)dxC. Then, we can write ✓ as ✓ = Z b a h(xi)dxi = Z b a h(xi) p⇤(xi) p⇤ (xi)dxi = Ep⇤  h(X⇤ i ) p⇤(X⇤ i ) , where p⇤ is a probability density function on (a, b) called the sampling tion, and X⇤ i is a random variable with density p⇤ . Let X⇤ i (1) , . . . , X⇤ i sample drawn from p⇤ . Then it is easy to prove that hen, we can write ✓ as ✓ = Z b a h(xi)dxi = Z b a h(xi) p⇤(xi) p⇤ (xi)dxi = Ep⇤  h(X⇤ i ) p⇤(X⇤ i ) , p⇤ is a probability density function on (a, b) called the sampling dis and X⇤ i is a random variable with density p⇤ . Let X⇤ i (1) , . . . , X⇤ i (m) e drawn from p⇤ . Then it is easy to prove that ˆ✓1 = 1 m mX j=1 h(X⇤ i (j) ) p⇤(X⇤ i (j) ) unbiased estimator of ✓. s ˆ✓1 is unbiased, the error of the estimation is determined by its var is Var(ˆ✓1) = Var 0 @ 1 m mX h(X⇤ i (j) ) p⇤(X⇤(j) ) 1 A = 1 m2 mX Var h(X⇤ i (j) ) p⇤(X⇤(j) ) ! = a h(xi)dxi = a h(xi) p⇤(xi) p⇤ (xi)dxi = Ep⇤ h p probability density function on (a, b) called th ⇤ is a random variable with density p⇤ . Let Xi n from p⇤ . Then it is easy to prove that ˆ✓1 = 1 m mX j=1 h(X⇤ i (j) ) p⇤(X⇤ i (j) ) d estimator of ✓. unbiased, the error of the estimation is determi
  • 36. Probability propagation in CLG networks: Importance sampling CAEPIA 2015 Albacete, November 9, 2015 36 Stream   S.  Dist   Map   Reduce   Stream   C.U.  C.U.   C.U.  C.U.   Sample   genera?on   Sufficient   sta?s?cs  
  • 37. Probability propagation in CLG networks: Importance sampling CAEPIA 2015 Albacete, November 9, 2015 37 Response  for  an  input  stream  with  a  network  of  500  variables  
  • 38. Probability propagation in CLG networks: Importance sampling CAEPIA 2015 Albacete, November 9, 2015 38 Response  for  an  input  stream  with  a  network  of  10  variables  
  • 39. MAP in CLG networks CAEPIA 2015 Albacete, November 9, 2015 39   MAP  is  similar  to  probability  propaga?on  but:     •  First  marginalize  out  by  sum/integral  (sum  phase)   •  Then  maximize  (max  phase)       Constrained  order  -­‐>  higher  complexity  
  • 40. MAP in CLG networks CAEPIA 2015 Albacete, November 9, 2015 40 MAP  in  the  AMIDST  Toolbox     •  Hill  Climbing  (global  and  local  change)   •  Simulated  Annealing  (global  and  local  change)   •  Sampling    
  • 41. MAP in CLG networks CAEPIA 2015 Albacete, November 9, 2015 41 Stream   S.  Dist   Map   Reduce   Stream   C.U.  C.U.   C.U.  C.U.   Mul?ple   star?ng   points   Local   solu?ons  
  • 42. Inference in dynamic networks CAEPIA 2015 Albacete, November 9, 2015 42 Task 3.3. Inference in dynamic networks Inference in DBNs faces the problem of entanglement: All variables used to encode the belief state at time t = 2 become dependent after observing {e0, e1, e2}. AMIDST, Review, Luxembourg, January 22, 2015 16
  • 43. Inference in dynamic networks CAEPIA 2015 Albacete, November 9, 2015 43 •  Varia?onal  message  passing  based  on  the  varia?onal   approxima?on  to  a  posterior  distribu?on  p(xI)  which   is  defined  as       •  Factored  fron?er,  which  assumes    independence  of   the  nodes  connec?ng  to  the  past  given  the   observa?ons   ference in DBNs will be approached following a Bayesian rmulation + Variational Bayes. he variational approximation to a posterior distribution p(xI defined as q⇤ (xI ) = arg min q2Q D(q(xI )||p(xI )), here D(q||p) is the KL divergence from q to p. n alternative is to focus on D(p(xI )||q(xI )), which orresponds to expectation propagation. he optimal variational distribution is computed iteratively.
  • 44. Learning CLG networks from data §  Learning  the  structure   §  Methods  based  on  condi?onal  independence  tests   §  Score  based  techniques     §  Es+ma+ng  the  parameters   §  Bayesian  approach   §  Frequen?st  approach  (maximum  likelihood)           CAEPIA 2015 Albacete, November 9, 2015 44
  • 45. Learning CLG networks from data   §  Bayesian  parameter  learning   §  Parameters  are  considered  random  variables  rather  than  fixed   quan??es   §  A  prior  distribu?on  is  assigned  to  the  parameters,  represen?ng   the  state  of  knowledge  before  observing  the  data   §  The  prior  is  updated  in  the  light  of  new  data.   §  The  Bayesian  framework  naturally  deals  with  data  streams           p(✓|d1, . . . , dn, dn+1) / p(dn+1|✓)p(✓|d1, . . . , dn) CAEPIA 2015 Albacete, November 9, 2015 45
  • 46. Learning CLG networks from data CAEPIA 2015 Albacete, November 9, 2015 46 AMIDST, 1st Annual Meeting, Copenhagen, November 27-28, 2014 Parameter learning by inference Simple example: I Random walk over Y1, Y2, . . . I f (yt|yt 1) ⇠ N(yt 1, ⌧ 1 ). I Precision ⌧ is unknown. Y1 Y2 Y3 Y4 Y5 AMIDST, Review, Luxembourg, January 22, 2015 12
  • 47. Learning CLG networks from data CAEPIA 2015 Albacete, November 9, 2015 47 AMIDST, 1st Annual Meeting, Copenhagen, November 27-28, 2014 Parameter learning by inference Simple example: I Random walk over Y1, Y2, . . . I f (yt|yt 1) ⇠ N(yt 1, 1/⌧). I Precision ⌧ is unknown. Y1 Y2 Y3 Y4 Y5 ⌧ ↵ The Bayesian solution: I Model unknown parameters as random variables. I Use Bayes formula with “clever” distribution families: f (⌧|y1:T , a, b) = f (⌧|a, b) QT t=1 f (yt|yt 1, ⌧) f (y1:T |a, b) . Efficient inference leads to efficient learning! AMIDST, Review, Luxembourg, January 22, 2015 13
  • 48. Modeling concept drift with DBNs CAEPIA 2015 Albacete, November 9, 2015 48
  • 49. Modeling concept drift with DBNs CAEPIA 2015 Albacete, November 9, 2015 49
  • 50. Exploratory  analysis   Part  III     CAEPIA 2015 Albacete, November 9, 2015 50
  • 51. Exploratory analysis §  Exploratory  analysis  helps  us  in  tes?ng  model  assump?ons   §  It  also  improves  the  modeler's  knowledge  about  the  problem   and  its  nature   §  Dynamic  Bayesian  networks  aim  at  modeling  complex  ?me   correla?ons   CAEPIA 2015 Albacete, November 9, 2015 51
  • 52. Sample correlogram §  Let  x1,...,xT be  a  univariate  ?me  series.  The  sample   autocorrela?on  coefficient  at  lag  v  is  given  by   §  It  represents  Pearson’s  correla?on  coefficient  between  series                                                                  and   CAEPIA 2015 Albacete, November 9, 2015 52 ich may strongly limit the extent of the extracted conclusions. How tations, this analysis will give us some interesting insights which us ted from experts, as we will see below for the di↵erent use cases. correlograms: Let x1, ..., xT be a univariate time series. The sa lation coe cient at lag v is given by ˆ⇢v = PT v t=1 (xt ¯x)(xt+v ¯x) PT t=1(xt ¯x)2 s the sample mean and T is the total length of the considered data. v versus v, for v = 1, . . . , M for some maximum M is called the sa am of the data. ˆpv corresponds to the Pearson correlation between }t2{1,...,T} and {xt+v}t+v2{1,...,T}. orrelograms can be interpreted as a way to measure the strength o unconditional dependences: Xt 6? Xt+v for some lag v 1. When ero, this indicates that there exists a strong unconditional independ Sample correlograms: Let x1, ..., xT be a univariate time serie autocorrelation coe cient at lag v is given by ˆ⇢v = PT v t=1 (xt ¯x)(xt+v ¯x) PT t=1(xt ¯x)2 where ¯x is the sample mean and T is the total length of the conside plot of ˆpv versus v, for v = 1, . . . , M for some maximum M is cal correlogram of the data. ˆpv corresponds to the Pearson correlatio series {xt}t2{1,...,T} and {xt+v}t+v2{1,...,T}. Sample correlograms can be interpreted as a way to measure the s following unconditional dependences: Xt 6? Xt+v for some lag v close to zero, this indicates that there exists a strong unconditiona between Xt and Xt+v. However, when ˆ⇢v is close to either 1 or 1 not be elicited from experts, as we will see below for the di↵erent use ca • Sample correlograms: Let x1, ..., xT be a univariate time series. autocorrelation coe cient at lag v is given by ˆ⇢v = PT v t=1 (xt ¯x)(xt+v ¯x) PT t=1(xt ¯x)2 where ¯x is the sample mean and T is the total length of the considere plot of ˆpv versus v, for v = 1, . . . , M for some maximum M is called correlogram of the data. ˆpv corresponds to the Pearson correlation series {xt}t2{1,...,T} and {xt+v}t+v2{1,...,T}. Sample correlograms can be interpreted as a way to measure the str following unconditional dependences: Xt 6? Xt+v for some lag v 1. close to zero, this indicates that there exists a strong unconditional in between Xt and Xt+v. However, when ˆ⇢v is close to either 1 or 1, th The  sample  correlogram  is  the  plot  of  the  sample   autocorrela?on  vs.  v    
  • 53. Sample correlogram for independent data CAEPIA 2015 Albacete, November 9, 2015 53 the Markov chain generating the time data expressed for the sample correlogram. (a) Correlogram for i.i.d. data (b)
  • 54. Sample correlogram for time correlated data CAEPIA 2015 Albacete, November 9, 2015 54 gram.
  • 55. Sample partial correlogram CAEPIA 2015 Albacete, November 9, 2015 55 §  Consider  the  regression  model   §  Let                    denote  the  residuals   §  The  sample  par?al  auto-­‐correla?on  coefficient  of  lag  v  is  the   standard  sample  auto-­‐correla?on  between  the  series                           {xt−v}t−v∈{1,...,T} and {et,v}t∈{1,...,T} §  It can be seen as the correlation between Xt and Xt−v after having removed the common linear effect of the data in between. al relationship, or more intuitively, the “memory” of the time seri s example, this “memory” inversely depends of the variance of t ✏ value. e partial correlograms: Let Xt be a random variable associa values at time t. We can build the following regression problem: Xt = a0 + a1Xt 1 + a2Xt 2 + ...av 1Xt v+1 tion, let et,v denotes the residuals of this regression problem (i.e., stimating Xt using a linear combination of v 1 previous observati partial auto-correlation coe cient of lag v, denoted as ˆ✓v, is the auto-correlation between the series {xt v}t v2{1,...,T } and {et,v} vely, the sample partial auto-correlation coe cient of lag v can b relation between Xt and Xt v after having removed the comm f the data in between. viously, we plot in Figure 3.8 (c) and (d) the sample partial cor us example, this “memory” inversely depends of the variance ✏ value. ple partial correlograms: Let Xt be a random variable as g values at time t. We can build the following regression probl Xt = a0 + a1Xt 1 + a2Xt 2 + ...av 1Xt v+1 dition, let et,v denotes the residuals of this regression problem estimating Xt using a linear combination of v 1 previous obse e partial auto-correlation coe cient of lag v, denoted as ˆ✓v, i e auto-correlation between the series {xt v}t v2{1,...,T} and ively, the sample partial auto-correlation coe cient of lag v c orrelation between Xt and Xt v after having removed the c of the data in between. eviously, we plot in Figure 3.8 (c) and (d) the sample partia
  • 56. Sample partial correlogram for independent data CAEPIA 2015 Albacete, November 9, 2015 56 (a) Correlogram for i.i.d. data (b) C (c) Partial correlogram for i.i.d. data (d) Part
  • 57. Sample partial correlogram for time correlated data CAEPIA 2015 Albacete, November 9, 2015 57 a (b) Correlogram for a time series data data (d) Partial correlogram for a time series data
  • 58. Bivariate contour plots CAEPIA 2015 Albacete, November 9, 2015 58 time series data described above. As it can be seen, the bivariate contou or time series data shows how Xt and Xt+1 seems to be distributed acco to a bivariate normal with a covariance matrix that displays a strong deg correlation. In the case of i.i.d. data, the bivariate contour plot does not any temporal dependence between Xt and Xt 1. (a) i.i.d. data (b) Time series data Figure 3.9: Bivariate contour plots for a set of i.i.d. and time series data.
  • 59. The R statistical software CAEPIA 2015 Albacete, November 9, 2015 59 §  R  has  become  a  successful  tool  for  data  analysis   §  Well known in Statistics, Machine Learning and Data Science communities §  “Free software environment for statistical computing and graphics” hpp://www.cran.r-­‐project.org  
  • 60. The Rstudio IDE CAEPIA 2015 Albacete, November 9, 2015 60 hpp://www.rstudio.com  
  • 61. The R statistical software CAEPIA 2015 Albacete, November 9, 2015 61 •  Exploratory  analysis  demo  using  R   •  Latex  document  genera?on  from  R   using  Sweave  
  • 62. The  Ramidst  package   Part  IV     CAEPIA 2015 Albacete, November 9, 2015 62
  • 63. The Ramidst package CAEPIA 2015 Albacete, November 9, 2015 63 §  The  package  provides  an  interface  for  using  the   AMIDST  toolbox  func?onality  from  R   §  The interaction is actually carried out through the rJava package §  So far Ramidst provides functions for inference in static networks and concept drift detection using DBNs §  Extensive extra functionality available thanks to the HUGIN link
  • 64. The AMIDST toolbox CAEPIA 2015 Albacete, November 9, 2015 64 •  Scalable  framework  for  data  stream  processing.   •  Based  on  Probabilis?c  Graphical  Models.   •  Unique  FP7  project  for  data  stream  mining  using  PGMs.   •  Open  source  so[ware  (Apache  So[ware  License  2.0).  
  • 65. The AMIDST toolbox official website CAEPIA 2015 Albacete, November 9, 2015 65 hZp://amidst.github.io/toolbox/  
  • 66. Available for download at GitHub CAEPIA 2015 Albacete, November 9, 2015 66 §  Download:    :>  git  clone  hZps://github.com/amidst/toolbox.git   §  Compile:    :>  ./compile.sh   §  Run:    :>  ./run.sh  <class-­‐name>  
  • 67. Please give our project a “star”! CAEPIA 2015 Albacete, November 9, 2015 67
  • 68. Processing data streams in R CAEPIA 2015 Albacete, November 9, 2015 68 §  RMOA   §  MOA  is  a  state-­‐of-­‐the-­‐art  tool  for  data  stream  mining.   §  RMOA  provides  func?onality  for  accessing  MOA  from  R   §  Several  sta?c  models  are  available   §  They  can  be  learnt  from  streams   §  Streams  can  be  created  from  csv  files  or  from  different  R  objects   hZp://moa.cms.waikato.ac.nz  
  • 69. The Ramidst package CAEPIA 2015 Albacete, November 9, 2015 69 Inference  and  concept  dri[  demo   using  Ramidst  
  • 70. CAEPIA 2015 Albacete, November 9, 2015 70 This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no 619209