SlideShare a Scribd company logo
1 of 8
Download to read offline
Oozie at Yahoo
Purshotam Shah, Ryota Egashira
Oozie Meetup 06/03 2014
Table of Contents
Yahoo Confidential & Proprietary
▪Scale at Yahoo!
▪Scale and Performance
▪Features - Usability
▪High availability and Load Balancing
▪Customer Asks
Scale at Yahoo!
▪Busiest cluster
› 1 million+ workflows per month
› 45 - 55K workflows per day
› 40 - 50K coord actions per day
› 800 - 900 coordinators (5m, 15m, 30m, hourly, daily and weekly)
› 30 - 40 bundles
▪Most complex bundle - 230 coordinators
▪Most complex workflow - 85 forks
▪Video Transcoding - 100-300 workflows per min
Scale and Performance
▪ Database
› CLOB to BLOB to compress and store inline
› Remove unnecessary hadoop config stored in protoActionConf
› Select only needed columns instead of loading whole row
› Partition tables by created time (in Oracle)
▪ Other
› Huge improvements to materialization of coordinator actions
› Reduce Launcher overhead
• Merge the number of small files created per action to one sequence file
• Launcher libraries shipped only once to HDFS
• Uber mode launcher with Hadoop 2.x
› Synchronously execute commands without queueing to speed up action transition
› Automatically killing abandoned coordinator job
› gzip compression for Rest API
Features - Usability
▪UI improvements
› Active Jobs, Custom Global Filters, Child Jobs for Pig/Hive actions
▪Faster log streaming with more filters
▪Updating coordinator definition on the fly
▪Rerun workflows without having to specify all properties again
▪Mark coordinator and actions as ignored
▪Sharelib Enhancements
› Update on the fly without failing jobs
› Command to list different sharelib available
› Specify directories using metafile instead of single share lib directory
High Availability and Load Balancing
▪HCat integration
▪SLA
▪Sharelib
▪Server-server authentication
▪Distributed sequence
Customer Asks
▪Coordinator dependency management
› Ability to view dependencies and rerun part of a pipeline
▪Better error handling and automatic retries
▪Ability to Suspend/Turn off SLA alerting
▪One-click launcher log viewing
▪Zero downtime
Rohini Palaniswamy
Mona Chitnis
Michelle Chiang
Purshotam Shah
Ryota Egashira
Olga L. Natkovich
Yahoo Oozie Team

More Related Content

What's hot

Oozie @ Riot Games
Oozie @ Riot GamesOozie @ Riot Games
Oozie @ Riot GamesMatt Goeke
 
Everything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieEverything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieChicago Hadoop Users Group
 
May 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop
May 2012 HUG: Oozie: Towards a scalable Workflow Management System for HadoopMay 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop
May 2012 HUG: Oozie: Towards a scalable Workflow Management System for HadoopYahoo Developer Network
 
Oozie HUG May12
Oozie HUG May12Oozie HUG May12
Oozie HUG May12mislam77
 
July 2012 HUG: Overview of Oozie Qualification Process
July 2012 HUG: Overview of Oozie Qualification ProcessJuly 2012 HUG: Overview of Oozie Qualification Process
July 2012 HUG: Overview of Oozie Qualification ProcessYahoo Developer Network
 
Data Pipeline Management Framework on Oozie
Data Pipeline Management Framework on OozieData Pipeline Management Framework on Oozie
Data Pipeline Management Framework on OozieShareThis
 
SCALE12X Build a Cloud Day: Chef: The Swiss Army Knife of Cloud Infrastructure
SCALE12X Build a Cloud Day: Chef: The Swiss Army Knife of Cloud InfrastructureSCALE12X Build a Cloud Day: Chef: The Swiss Army Knife of Cloud Infrastructure
SCALE12X Build a Cloud Day: Chef: The Swiss Army Knife of Cloud InfrastructureMatt Ray
 
TXLF: Chef- Software Defined Infrastructure Today & Tomorrow
TXLF: Chef- Software Defined Infrastructure Today & TomorrowTXLF: Chef- Software Defined Infrastructure Today & Tomorrow
TXLF: Chef- Software Defined Infrastructure Today & TomorrowMatt Ray
 
Ansible for large scale deployment
Ansible for large scale deploymentAnsible for large scale deployment
Ansible for large scale deploymentKarthik .P.R
 
Oozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY WayOozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY WayDataWorks Summit
 
Clogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overviewClogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overviewMadhur Nawandar
 
Empowering developers to deploy their own data stores
Empowering developers to deploy their own data storesEmpowering developers to deploy their own data stores
Empowering developers to deploy their own data storesTomas Doran
 
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsMay 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsYahoo Developer Network
 
RESTFul development with Apache sling
RESTFul development with Apache slingRESTFul development with Apache sling
RESTFul development with Apache slingSergii Fesenko
 
Cool MariaDB Plugins
Cool MariaDB Plugins Cool MariaDB Plugins
Cool MariaDB Plugins Colin Charles
 
Liquibase & Flyway @ Baltic DevOps
Liquibase & Flyway @ Baltic DevOpsLiquibase & Flyway @ Baltic DevOps
Liquibase & Flyway @ Baltic DevOpsAndrei Solntsev
 

What's hot (20)

October 2014 HUG : Oozie HA
October 2014 HUG : Oozie HAOctober 2014 HUG : Oozie HA
October 2014 HUG : Oozie HA
 
Oozie @ Riot Games
Oozie @ Riot GamesOozie @ Riot Games
Oozie @ Riot Games
 
Everything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieEverything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about Oozie
 
May 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop
May 2012 HUG: Oozie: Towards a scalable Workflow Management System for HadoopMay 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop
May 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop
 
Oozie HUG May12
Oozie HUG May12Oozie HUG May12
Oozie HUG May12
 
July 2012 HUG: Overview of Oozie Qualification Process
July 2012 HUG: Overview of Oozie Qualification ProcessJuly 2012 HUG: Overview of Oozie Qualification Process
July 2012 HUG: Overview of Oozie Qualification Process
 
Data Pipeline Management Framework on Oozie
Data Pipeline Management Framework on OozieData Pipeline Management Framework on Oozie
Data Pipeline Management Framework on Oozie
 
Hadoop Oozie
Hadoop OozieHadoop Oozie
Hadoop Oozie
 
SCALE12X Build a Cloud Day: Chef: The Swiss Army Knife of Cloud Infrastructure
SCALE12X Build a Cloud Day: Chef: The Swiss Army Knife of Cloud InfrastructureSCALE12X Build a Cloud Day: Chef: The Swiss Army Knife of Cloud Infrastructure
SCALE12X Build a Cloud Day: Chef: The Swiss Army Knife of Cloud Infrastructure
 
TXLF: Chef- Software Defined Infrastructure Today & Tomorrow
TXLF: Chef- Software Defined Infrastructure Today & TomorrowTXLF: Chef- Software Defined Infrastructure Today & Tomorrow
TXLF: Chef- Software Defined Infrastructure Today & Tomorrow
 
Ansible for large scale deployment
Ansible for large scale deploymentAnsible for large scale deployment
Ansible for large scale deployment
 
Oozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY WayOozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY Way
 
Clogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overviewClogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overview
 
Empowering developers to deploy their own data stores
Empowering developers to deploy their own data storesEmpowering developers to deploy their own data stores
Empowering developers to deploy their own data stores
 
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsMay 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
 
RESTFul development with Apache sling
RESTFul development with Apache slingRESTFul development with Apache sling
RESTFul development with Apache sling
 
Cool MariaDB Plugins
Cool MariaDB Plugins Cool MariaDB Plugins
Cool MariaDB Plugins
 
Hadoop HDFS
Hadoop HDFS Hadoop HDFS
Hadoop HDFS
 
JavaCro'14 - Using WildFly core to build high performance web server – Tomaž ...
JavaCro'14 - Using WildFly core to build high performance web server – Tomaž ...JavaCro'14 - Using WildFly core to build high performance web server – Tomaž ...
JavaCro'14 - Using WildFly core to build high performance web server – Tomaž ...
 
Liquibase & Flyway @ Baltic DevOps
Liquibase & Flyway @ Baltic DevOpsLiquibase & Flyway @ Baltic DevOps
Liquibase & Flyway @ Baltic DevOps
 

Viewers also liked

Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas NApache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas NYahoo Developer Network
 
Building and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieBuilding and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieDataWorks Summit/Hadoop Summit
 
A Basic Hive Inspection
A Basic Hive InspectionA Basic Hive Inspection
A Basic Hive InspectionLinda Tillman
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopZheng Shao
 
August 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache OozieAugust 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache OozieYahoo Developer Network
 

Viewers also liked (6)

Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas NApache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
 
Building and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieBuilding and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache Oozie
 
A Basic Hive Inspection
A Basic Hive InspectionA Basic Hive Inspection
A Basic Hive Inspection
 
Hive tuning
Hive tuningHive tuning
Hive tuning
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
 
August 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache OozieAugust 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache Oozie
 

Similar to Oozie at Yahoo

Stack Exchange Infrastructure - LISA 14
Stack Exchange Infrastructure - LISA 14Stack Exchange Infrastructure - LISA 14
Stack Exchange Infrastructure - LISA 14GABeech
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Kai Sasaki
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for HadoopJoe Crobak
 
SharePoint Saturday The Conference 2011 - SP2010 Performance
SharePoint Saturday The Conference 2011 - SP2010 PerformanceSharePoint Saturday The Conference 2011 - SP2010 Performance
SharePoint Saturday The Conference 2011 - SP2010 PerformanceBrian Culver
 
SharePoint Saturday San Antonio: SharePoint 2010 Performance
SharePoint Saturday San Antonio: SharePoint 2010 PerformanceSharePoint Saturday San Antonio: SharePoint 2010 Performance
SharePoint Saturday San Antonio: SharePoint 2010 PerformanceBrian Culver
 
Service-Oriented Design and Implement with Rails3
Service-Oriented Design and Implement with Rails3Service-Oriented Design and Implement with Rails3
Service-Oriented Design and Implement with Rails3Wen-Tien Chang
 
Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010Cloudera, Inc.
 
DrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalabilityDrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalabilitycherryhillco
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio, Inc.
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksData Con LA
 
MariaDB 10.1 what's new and what's coming in 10.2 - Tokyo MariaDB Meetup
MariaDB 10.1   what's new and what's coming in 10.2 - Tokyo MariaDB MeetupMariaDB 10.1   what's new and what's coming in 10.2 - Tokyo MariaDB Meetup
MariaDB 10.1 what's new and what's coming in 10.2 - Tokyo MariaDB MeetupColin Charles
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messagesfeng1212
 
WildFly v9 - State of the Union Session at Voxxed, Istanbul, May/9th 2015.
WildFly v9 - State of the Union Session at Voxxed, Istanbul, May/9th 2015.WildFly v9 - State of the Union Session at Voxxed, Istanbul, May/9th 2015.
WildFly v9 - State of the Union Session at Voxxed, Istanbul, May/9th 2015.Dimitris Andreadis
 
Best practices for MySQL High Availability Tutorial
Best practices for MySQL High Availability TutorialBest practices for MySQL High Availability Tutorial
Best practices for MySQL High Availability TutorialColin Charles
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestHBaseCon
 
A Second Look at Oracle RAC 12c
A Second Look at Oracle RAC 12cA Second Look at Oracle RAC 12c
A Second Look at Oracle RAC 12cLeighton Nelson
 
Caching strategies with lucee
Caching strategies with luceeCaching strategies with lucee
Caching strategies with luceeGert Franz
 

Similar to Oozie at Yahoo (20)

Stack Exchange Infrastructure - LISA 14
Stack Exchange Infrastructure - LISA 14Stack Exchange Infrastructure - LISA 14
Stack Exchange Infrastructure - LISA 14
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for Hadoop
 
SharePoint Saturday The Conference 2011 - SP2010 Performance
SharePoint Saturday The Conference 2011 - SP2010 PerformanceSharePoint Saturday The Conference 2011 - SP2010 Performance
SharePoint Saturday The Conference 2011 - SP2010 Performance
 
SharePoint Saturday San Antonio: SharePoint 2010 Performance
SharePoint Saturday San Antonio: SharePoint 2010 PerformanceSharePoint Saturday San Antonio: SharePoint 2010 Performance
SharePoint Saturday San Antonio: SharePoint 2010 Performance
 
Service-Oriented Design and Implement with Rails3
Service-Oriented Design and Implement with Rails3Service-Oriented Design and Implement with Rails3
Service-Oriented Design and Implement with Rails3
 
Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010
 
DrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalabilityDrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalability
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
 
MariaDB 10.1 what's new and what's coming in 10.2 - Tokyo MariaDB Meetup
MariaDB 10.1   what's new and what's coming in 10.2 - Tokyo MariaDB MeetupMariaDB 10.1   what's new and what's coming in 10.2 - Tokyo MariaDB Meetup
MariaDB 10.1 what's new and what's coming in 10.2 - Tokyo MariaDB Meetup
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messages
 
WildFly v9 - State of the Union Session at Voxxed, Istanbul, May/9th 2015.
WildFly v9 - State of the Union Session at Voxxed, Istanbul, May/9th 2015.WildFly v9 - State of the Union Session at Voxxed, Istanbul, May/9th 2015.
WildFly v9 - State of the Union Session at Voxxed, Istanbul, May/9th 2015.
 
Sql plsql online training
Sql plsql online trainingSql plsql online training
Sql plsql online training
 
Best practices for MySQL High Availability Tutorial
Best practices for MySQL High Availability TutorialBest practices for MySQL High Availability Tutorial
Best practices for MySQL High Availability Tutorial
 
ReactPHP + Symfony
ReactPHP + SymfonyReactPHP + Symfony
ReactPHP + Symfony
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 
Drupal performance
Drupal performanceDrupal performance
Drupal performance
 
A Second Look at Oracle RAC 12c
A Second Look at Oracle RAC 12cA Second Look at Oracle RAC 12c
A Second Look at Oracle RAC 12c
 
Caching strategies with lucee
Caching strategies with luceeCaching strategies with lucee
Caching strategies with lucee
 

Recently uploaded

Clutches and brkesSelect any 3 position random motion out of real world and d...
Clutches and brkesSelect any 3 position random motion out of real world and d...Clutches and brkesSelect any 3 position random motion out of real world and d...
Clutches and brkesSelect any 3 position random motion out of real world and d...sahb78428
 
sdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdf
sdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdfsdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdf
sdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdfJulia Kaye
 
Guardians and Glitches: Navigating the Duality of Gen AI in AppSec
Guardians and Glitches: Navigating the Duality of Gen AI in AppSecGuardians and Glitches: Navigating the Duality of Gen AI in AppSec
Guardians and Glitches: Navigating the Duality of Gen AI in AppSecTrupti Shiralkar, CISSP
 
ASME BPVC 2023 Section I para leer y entender
ASME BPVC 2023 Section I para leer y entenderASME BPVC 2023 Section I para leer y entender
ASME BPVC 2023 Section I para leer y entenderjuancarlos286641
 
me3493 manufacturing technology unit 1 Part A
me3493 manufacturing technology unit 1 Part Ame3493 manufacturing technology unit 1 Part A
me3493 manufacturing technology unit 1 Part Akarthi keyan
 
solar wireless electric vechicle charging system
solar wireless electric vechicle charging systemsolar wireless electric vechicle charging system
solar wireless electric vechicle charging systemgokuldongala
 
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...Amil baba
 
Transforming Process Safety Management: Challenges, Benefits, and Transition ...
Transforming Process Safety Management: Challenges, Benefits, and Transition ...Transforming Process Safety Management: Challenges, Benefits, and Transition ...
Transforming Process Safety Management: Challenges, Benefits, and Transition ...soginsider
 
UNIT4_ESD_wfffffggggggggggggith_ARM.pptx
UNIT4_ESD_wfffffggggggggggggith_ARM.pptxUNIT4_ESD_wfffffggggggggggggith_ARM.pptx
UNIT4_ESD_wfffffggggggggggggith_ARM.pptxrealme6igamerr
 
Basic Principle of Electrochemical Sensor
Basic Principle of  Electrochemical SensorBasic Principle of  Electrochemical Sensor
Basic Principle of Electrochemical SensorTanvir Moin
 
Nodal seismic construction requirements.pptx
Nodal seismic construction requirements.pptxNodal seismic construction requirements.pptx
Nodal seismic construction requirements.pptxwendy cai
 
GENERAL CONDITIONS FOR CONTRACTS OF CIVIL ENGINEERING WORKS
GENERAL CONDITIONS  FOR  CONTRACTS OF CIVIL ENGINEERING WORKS GENERAL CONDITIONS  FOR  CONTRACTS OF CIVIL ENGINEERING WORKS
GENERAL CONDITIONS FOR CONTRACTS OF CIVIL ENGINEERING WORKS Bahzad5
 
Test of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptxTest of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptxHome
 
Graphics Primitives and CG Display Devices
Graphics Primitives and CG Display DevicesGraphics Primitives and CG Display Devices
Graphics Primitives and CG Display DevicesDIPIKA83
 
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...Sean Meyn
 
IT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptxIT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptxSAJITHABANUS
 
Phase noise transfer functions.pptx
Phase noise transfer      functions.pptxPhase noise transfer      functions.pptx
Phase noise transfer functions.pptxSaiGouthamSunkara
 

Recently uploaded (20)

Clutches and brkesSelect any 3 position random motion out of real world and d...
Clutches and brkesSelect any 3 position random motion out of real world and d...Clutches and brkesSelect any 3 position random motion out of real world and d...
Clutches and brkesSelect any 3 position random motion out of real world and d...
 
Lecture 2 .pdf
Lecture 2                           .pdfLecture 2                           .pdf
Lecture 2 .pdf
 
Présentation IIRB 2024 Marine Cordonnier.pdf
Présentation IIRB 2024 Marine Cordonnier.pdfPrésentation IIRB 2024 Marine Cordonnier.pdf
Présentation IIRB 2024 Marine Cordonnier.pdf
 
Présentation IIRB 2024 Chloe Dufrane.pdf
Présentation IIRB 2024 Chloe Dufrane.pdfPrésentation IIRB 2024 Chloe Dufrane.pdf
Présentation IIRB 2024 Chloe Dufrane.pdf
 
sdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdf
sdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdfsdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdf
sdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdf
 
Guardians and Glitches: Navigating the Duality of Gen AI in AppSec
Guardians and Glitches: Navigating the Duality of Gen AI in AppSecGuardians and Glitches: Navigating the Duality of Gen AI in AppSec
Guardians and Glitches: Navigating the Duality of Gen AI in AppSec
 
ASME BPVC 2023 Section I para leer y entender
ASME BPVC 2023 Section I para leer y entenderASME BPVC 2023 Section I para leer y entender
ASME BPVC 2023 Section I para leer y entender
 
me3493 manufacturing technology unit 1 Part A
me3493 manufacturing technology unit 1 Part Ame3493 manufacturing technology unit 1 Part A
me3493 manufacturing technology unit 1 Part A
 
solar wireless electric vechicle charging system
solar wireless electric vechicle charging systemsolar wireless electric vechicle charging system
solar wireless electric vechicle charging system
 
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...
 
Transforming Process Safety Management: Challenges, Benefits, and Transition ...
Transforming Process Safety Management: Challenges, Benefits, and Transition ...Transforming Process Safety Management: Challenges, Benefits, and Transition ...
Transforming Process Safety Management: Challenges, Benefits, and Transition ...
 
UNIT4_ESD_wfffffggggggggggggith_ARM.pptx
UNIT4_ESD_wfffffggggggggggggith_ARM.pptxUNIT4_ESD_wfffffggggggggggggith_ARM.pptx
UNIT4_ESD_wfffffggggggggggggith_ARM.pptx
 
Basic Principle of Electrochemical Sensor
Basic Principle of  Electrochemical SensorBasic Principle of  Electrochemical Sensor
Basic Principle of Electrochemical Sensor
 
Nodal seismic construction requirements.pptx
Nodal seismic construction requirements.pptxNodal seismic construction requirements.pptx
Nodal seismic construction requirements.pptx
 
GENERAL CONDITIONS FOR CONTRACTS OF CIVIL ENGINEERING WORKS
GENERAL CONDITIONS  FOR  CONTRACTS OF CIVIL ENGINEERING WORKS GENERAL CONDITIONS  FOR  CONTRACTS OF CIVIL ENGINEERING WORKS
GENERAL CONDITIONS FOR CONTRACTS OF CIVIL ENGINEERING WORKS
 
Test of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptxTest of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptx
 
Graphics Primitives and CG Display Devices
Graphics Primitives and CG Display DevicesGraphics Primitives and CG Display Devices
Graphics Primitives and CG Display Devices
 
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
 
IT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptxIT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptx
 
Phase noise transfer functions.pptx
Phase noise transfer      functions.pptxPhase noise transfer      functions.pptx
Phase noise transfer functions.pptx
 

Oozie at Yahoo

  • 1. Oozie at Yahoo Purshotam Shah, Ryota Egashira Oozie Meetup 06/03 2014
  • 2. Table of Contents Yahoo Confidential & Proprietary ▪Scale at Yahoo! ▪Scale and Performance ▪Features - Usability ▪High availability and Load Balancing ▪Customer Asks
  • 3. Scale at Yahoo! ▪Busiest cluster › 1 million+ workflows per month › 45 - 55K workflows per day › 40 - 50K coord actions per day › 800 - 900 coordinators (5m, 15m, 30m, hourly, daily and weekly) › 30 - 40 bundles ▪Most complex bundle - 230 coordinators ▪Most complex workflow - 85 forks ▪Video Transcoding - 100-300 workflows per min
  • 4. Scale and Performance ▪ Database › CLOB to BLOB to compress and store inline › Remove unnecessary hadoop config stored in protoActionConf › Select only needed columns instead of loading whole row › Partition tables by created time (in Oracle) ▪ Other › Huge improvements to materialization of coordinator actions › Reduce Launcher overhead • Merge the number of small files created per action to one sequence file • Launcher libraries shipped only once to HDFS • Uber mode launcher with Hadoop 2.x › Synchronously execute commands without queueing to speed up action transition › Automatically killing abandoned coordinator job › gzip compression for Rest API
  • 5. Features - Usability ▪UI improvements › Active Jobs, Custom Global Filters, Child Jobs for Pig/Hive actions ▪Faster log streaming with more filters ▪Updating coordinator definition on the fly ▪Rerun workflows without having to specify all properties again ▪Mark coordinator and actions as ignored ▪Sharelib Enhancements › Update on the fly without failing jobs › Command to list different sharelib available › Specify directories using metafile instead of single share lib directory
  • 6. High Availability and Load Balancing ▪HCat integration ▪SLA ▪Sharelib ▪Server-server authentication ▪Distributed sequence
  • 7. Customer Asks ▪Coordinator dependency management › Ability to view dependencies and rerun part of a pipeline ▪Better error handling and automatic retries ▪Ability to Suspend/Turn off SLA alerting ▪One-click launcher log viewing ▪Zero downtime
  • 8. Rohini Palaniswamy Mona Chitnis Michelle Chiang Purshotam Shah Ryota Egashira Olga L. Natkovich Yahoo Oozie Team