Loading…
DevConf.cz 2021 has ended
ML / AI / Big Data [clear filter]
Thursday, February 18
 

2:45pm CET

Build an e2e analytics application using DataHub
In this talk you will learn how to build an end-to-end analytics application using (almost) only jupyter notebooks as the basic unit of development.

When developing AI driven applications, there is often a friction point in the development life cycle when porting your code from a Data Scientistโ€s experimental notebook into the production ready code a Software Engineer expects. What if we could find a solution to this issue, and let the notebooks themselves be run in a DAG workflow, connected to each other, allowing them to share and exchange data and avoid this porting step all together, making it simpler for data scientists and software engineers to collaborate and quickly iterate on their AI driven application?

We will walk you through a case study where we did just that. Using the Open Data Hub toolkit, specifically: Jupyterhub, Ceph, Hive, Hue, Superset and Argo on Openshift, to build a recurring email list analytics and dashboard application. Highlighting some pitfalls we made along the way, how we could improve in the future with Elyra and how this process is general enough to be applied to many AI driven application development use cases.

Speakers
avatar for Michael Clifford

Michael Clifford

Principle Data Scientist, Red Hat
Michael Clifford is a Data Scientist at Red Hat working in the Office of the CTO on Emerging Technologies, where he works primarily on exploring tools, methodologies and use cases for cloud native data science.
avatar for Tom Coufal

Tom Coufal

Principal Software Engineer, Red Hat
Tom is a principal software engineer at Red Hat, working in open source for all his career. He joined Red Hat 8 years ago as an intern after freshman year of university. He has masters degree in Bioinformatics and Biocomputing.During his time at Red Hat he had the opportunity to experience... Read More →



Thursday February 18, 2021 2:45pm - 3:10pm CET
Session Room 2

3:30pm CET

Share data without revealing personal information
Deep learning and machine learning more broadly depend on large quantities of data to develop accurate predictive models. In areas such as medical research, sharing data among institutions can lead to even greater value. However, data often includes personally identifiable information that we may not want to (or even be legally allowed to) share with others. Traditional anonymization techniques only help to a degree.

In this talk, Red Hat's Gordon Haff will share with you the active activity taking place in academia, open source communities, and elsewhere into techniques such as differential privacy. The goal of this research and ongoing work is to help individuals and organizations work collaboratively while preserving the anonymity of individual data points.

Speakers
avatar for Gordon Haff

Gordon Haff

Principal, BitMasons
Gordon Haff is Principal Analyst at BitMasons where he writes and consults with an emphasis on open source and computing infrastructure. At Red Hat, he worked on market insights and portfolio architectures and wrote about tech, trends, and their business impact. His books include... Read More →


Thursday February 18, 2021 3:30pm - 3:55pm CET
Session Room 2

4:30pm CET

Beyond Inference: Bringing ML into Production
Exploiting the business value of data science doesnโ€t end with training a Machine Learning model โ€” in fact that is just the beginning. Data scientists want to maximize model performance while application developers want a deployment that builds repeatably and behaves predictably. Model serving smooths the transition from data science to applications in production. This talk will explain what model serving is, who should care, and show participants how to use the open source model-serving project, Seldon Core, to serve models on Kubernetes.

This session is told from a data scientist's point of view and documents building a model serving pipeline as a whole. This includes pain points of model serving such as clunky pipelines, but also celebrates the parts that work well, such as scalability within Kubernetes.

The audience will learn: the basics of model serving, why this is a relevant issue, how model serving offers relief for the data scientist/software engineer handoff, and know how to deploy their machine learning model with Seldon Core.

Speakers


Thursday February 18, 2021 4:30pm - 4:55pm CET
Session Room 2

5:00pm CET

Responsible AI: Ethics in Software Development
Responsible AI: Building ethical practices in your software development lifecycle
Advancements in AI are different than other technologies because of the pace of innovation, and its proximity to human intelligence - impacting us at a personal and societal level. The industry today is optimistic about the incredible potential for AI and other advanced technologies to empower people, widely benefiting current and future generations, thereby working for the common good.

However, nearly 9 out of 10 organizations across countries have encountered ethical issues resulting from the use of AI. As Microsoft, we recognize that these same technologies also raise important challenges that we need to address clearly, thoughtfully, and affirmatively. In this talk, I will present Microsoft's approach towards Responsible AI, talking about the six principles to develop technology responsibly.
This will be a focussed session where I will start with a demo of how, if not used properly, AI can lead to biases. Post that we will go into Microsoft's principles and how they lead to an efficient system - showcasing how the same demo will look after incorporating the Responsible AI Principles.

Speakers
RG

Rishabh Gaur

Technical Architect, Microsoft
I work as a Technical Architect at Microsoft, where I work on advancing Microsoft's world view of Intelligent cloud and Intelligent Edge to our customers and partners. While working as a cross-domain architect, I specialize in Microsoft Power Platform, Azure IoT and Azure App Dev... Read More →


Thursday February 18, 2021 5:00pm - 5:40pm CET
Session Room 2

5:45pm CET

Data Science Meets DevOps: Gitops with OpenShift
OpenShift being a fast-growing open source application platform, makes deploying and managing machine learning models very easy and convenient. Since the development of machine learning models is an iterative procedure, many data scientists prefer to store their model source code like their Jupyter notebooks in Git so that they can perform frequent updates to their models. These models are then deployed via OpenShift in a production environment. To obtain the most optimized model, it is necessary for the models to be continuously retrained and deployed. How can we efficiently manage this periodic retraining and deployment of these machine learning models?

In this talk you will learn how to leverage DevOps for managing machine learning models on OpenShift. With the help of CI/CD tools like Tekton pipelines, we can now extend version control for data science applications as well. You will walk away from this talk knowing how to:
1. Train a machine learning model on an example use case
2. Maintain ML model code in Git
3. Setup CI/CD pipeline for your ML application

Speakers
avatar for Hema Veeradhi

Hema Veeradhi

Senior Data Scientist, Red Hat
Hema Veeradhi is a Senior Data Scientist working in the Emerging Technologies team part of the office of the CTO at Red Hat. Her work primarily focuses on implementing innovative open AI and machine learning solutions to help solve business and engineering problems. Hema is a staunch... Read More →


Thursday February 18, 2021 5:45pm - 6:25pm CET
Session Room 2
 
Saturday, February 20
 

2:30pm CET

Building Petabyte Scale ML Models with Python
Abstract

Although building ML models on small/ toy data-set is easy, most production-grade problems involve massive datasets which current ML practices don't scale to. In this talk, we cover how you can drastically increase the amount of data that your models can learn from using distributed data/ml pipes.

It can be difficult to figure out how to work with large data-sets (which do not fit in your RAM), even if you're already comfortable with ML libraries/ APIs within python. Many questions immediately come up: Which library should I use, and why? What's the difference between a 'map-reduce' and a 'task-graph'? What's a partial fit function, and what format does it expect the data in? Is it okay for my training data to have more features than observations? What's the appropriate machine learning model to use? And so on...

In this talk, we'll answer all those questions, and more!

We'll start by walking through the current distributed analytics (out-of-core learning) landscape in order to understand the pain-points and some solutions to this problem.

Here is a sketch of a system designed to achieve this goal (of building scalable ML models):

1. a way to stream instances
2. a way to extract features from instances
3. an incremental algorithm

Then we'll read a large dataset into Dask, Tensorflow (tf.data) & sklearn streaming, and immediately apply what we've learned about in last section. We'll move on to the model building process, including a discussion of which model is most appropriate for the task. We'll evaluate our model a few different ways, and then examine the model for greater insight into how the data is influencing its predictions. Finally, we'll practice this entire workflow on a new dataset, and end with a discussion of which parts of the process are worth tuning for improved performance.

Detailed Outline

1. Intro to out-of-core learning
2. Representing large datasets as instances
3. Transforming data (in batches) - live code [3-5]
4. Feature Engineering & Scaling
5. Building and evaluating a model (on entire datasets)
6. Practicing this workflow on another dataset
7. Benchmark other libraries/ for OOC learning
8. Questions and Answers

Key takeaway

By the end of the talk participants would know how to build petabyte scale ML models, beyond the shackles of conventional python libraries.

Participants would have a benchmarks and best case practices for building such ML models at scale.

Speakers
avatar for Vaibhav Srivastav

Vaibhav Srivastav

Data Scientist, Deloitte GmbH
I am a Data Scientist and a Master's Candidate - Computational Linguistics at Universität Stuttgart. I am currently researching on Speech, Language and Vision methods for extracting value out of unstructured data.In my previous stint with Deloitte Consulting LLP, I worked with Fortune... Read More →


Saturday February 20, 2021 2:30pm - 2:55pm CET
Session Room 4

3:00pm CET

Stateful Sessions for Intelligent Apps
Live audio transcription and other similar applications require stateful processing to support both multi-user sessions and dynamic scale-out. We can persist audio state with a Kafka kappa architecture, but that state must also be preserved across the OpenShift cluster boundary to user web clients. Fortunately, OpenShift's sticky sessions allow stateful sessions to be implemented without complicated custom configurations.

In this talk, Gage will explain how to convert your single user constrained application to support stateful sessions with any number of users. Using the power of OpenShift and Open Data Hub's data monitoring and streaming tools, a stateful architecture can be developed and managed easily. We will showcase a real-time audio transcription use case, including a Kafka streaming architecture, in a practical data science application.

Speakers
avatar for Gage Krumbach

Gage Krumbach

AICoE FDE Intern, Red Hat
I have been an intern at Red Hat since summer 2020 and have been working on the Forward Deployed Engineers team inside the AI Center of Excellence.



Saturday February 20, 2021 3:00pm - 3:25pm CET
Session Room 4
 
  • Timezone
  • Filter By Date DevConf.cz 2021 Feb 18 -20, 2021
  • Filter By Venue hopin.to
  • Filter By Type
  • Agile DevOps
  • Bughunting
  • Cloud and Containers
  • Community
  • Community Booths
  • Debug / Tracing
  • Desktop
  • Developer Tools
  • Documentation
  • Fedora
  • Frontend / UI / UX
  • Immutable OS
  • IoT
  • Kernel
  • Meetup
  • Microservices
  • Middleware
  • ML / AI / Big Data
  • Networking
  • Platform / OS
  • Quality / Testing
  • Security / IdM
  • Storage / Ceph / Gluster
  • Virtualization
  • Workshop

Filter sessions
Apply filters to sessions.