Nous avons le plaisir de vous inviter au Spark meetup le lundi 26 octobre chez Google (8 rue de Londres à Paris) à 18h00.
Nous aurons le plaisir d’avoir 3 supers speakers dont certains venus des US pour vous parler des dernières nouveautés autour de Spark.

• 6h-6h15 Welcome

• 6:15-6:45 : Google Dataproc by Sébastien Agnan, Cloud Platform Sales Engineer at Google and Vincent Heuschling, General Manager of AffiniTechThanks!

Google Cloud Dataproc is a managed Hadoop MapReduce, Spark, Pig, and Hive service

Sébastien Agnan a rejoint Google for Work en 2012 et assume aujourd’hui la responsabilité technique de l’offre Google Cloud Platform (IaaS/PaaS/Big Data) pour l’Europe du Sud. Spécialiste des architectures Cloud, il accompagne les clients Google for Work pour concevoir des solutions innovantes, en exploitant les nouvelles technologies et architectures Cloud comme le BigData, les backend mobiles, le Real Time Bidding, … Sébastien, diplômé de l’ESEO, avec une spécialisation en architecture des systèmes d’information, était architecte puis avant ventes chez ORACLE, avant de rejoindre Google.

Vincent Heuschling is the founder of Affini-Tech a company dedicated to Bigdata solutions. He leads a team of data-engineers to help his customer to build their Bigdata Platforms. As a Google Cloud partner, Affini-tech use the Google Cloud Platform every day to run bigdata solutions like Hadoop, Spark, and Cassandra.

• 6h45-7:30 : Deep dive into Project Tungsten: Bring Spark closer to bare metal by Reynold Xin, Co-Founder of Databricks, key Spark Committer

Project Tungsten focuses on substantially improving the efficiency of memory and CPU for Spark applications, to push performance closer to the limits of modern hardware.
This effort includes three initiatives:
1. Code generation: using code generation to exploit modern compilers and CPUs
2. Cache-aware computation: algorithms and data structures to exploit memory hierarchy
3. Memory Management and Binary Processing: leveraging application semantics to manage memory explicitly and eliminate the overhead of JVM object model and garbage collection
Project Tungsten will be the largest change to Spark’s execution engine since the project’s inception. In this talk, we will give an update on its progress and dive into some of the technical challenges we are solving.

Reynold Xin is a committer and PMC member of Apache Spark. He is also a co-founder of Databricks and oversees architectural directions for Spark. Before Databricks, he was pursuing a Ph.D. in the University of California-Berkeley AMPLab, where Spark was born.

• 7:30-8:15 : Spark after dark by Chris Fregly, Principal Data Solutions Engineer at IBM Spark Technology Center in San Francisco

Combining the most popular and technically-deep material from his wildly popular Advanced Apache Spark Meetup, Chris Fregly will provide a code-level deep dive on the latest advancements within the Apache Spark Ecosystem including the following:
1) Spark SQL/DataFrames and the Data Sources API with Cassandra and ElasticSearch
2) Spark Streaming Performance Improvements with Kafka and Kinesis
3) Feature Engineering and Recommender Systems with MLlib/GraphX
4) Approximations and Probabilistic Data Structures with Spark and Twitter’s Algebird
5) Partition Pruning and Predicate Pushdowns with Parquet and ORC
6) Performance Tuning and Mechanical Sympathy with Project Tungsten
This talk features many interesting and audience-interactive demos – as well as code-level deep dives into many of the open source codebases mentioned above.

All code is available on Github at the following link:

In addition, all demos and tools are prepackaged into a Docker image and available for download on Docker Hub at the following link:

Chris Fregly is a Principal Data Solutions Engineer for the newly-formed IBM Spark Technology Center, an Apache Spark Contributor, a Netflix Open Source Committer, as well as the Organizer of the global Advanced Apache Spark Meetup and author of the upcoming book, Advanced Spark. Previously, Chris was a Data Solutions Engineer at Databricks and a Streaming Data Engineer at Netflix.
When Chris isn’t contributing to Spark and other open source projects, he’s creating book chapters, slides, and demos to share knowledge with his peers at meetups and conferences throughout the world.

• 8:15-9:30 : networking

