Scaling Data Science at the EPA

Join Jeff Hollister & David Smith to learn about two examples of EPA's journey in adopting popular open source tools.

Scaling Data Science at the EPA

November 8, 2019

The adoption of open source tools into production can be a slow and challenging process. The end result, however, is usually an environment where data is more readily available and better decisions can be made. We invite you to join Jeff Hollister & David Smith to learn about two examples of EPA's journey in adopting popular open source tools. This webinar will be an honest look at the challenges they faced and the progress they have made throughout this journey, along with how the EPA is currently leveraging open source tools to do better data science.

Part One: Two steps forward, one step back: a reluctant open data science transformation

Jeff Hollister - Research Ecologist, US EPA

Open science relies on improving the accessibility of all aspects of the research process including code, data, and manuscripts. The tools and concepts that help facilitate this may include open source software, proper licensing, collaboration platforms, and novel modes of publishing. At the US EPA, researchers embrace open science and utilize many of these tools. Jeff will present a user's perspective on several of these topics, implementation pain points, and how he and his colleagues overcame these challenges.

Part Two: The EPA's RStudio environment on Amazon Cloud

Dave Smith - Information Access and Analytical Services, EPA Office of Mission Support

Often, analysis on traditional desktop workstations runs into computational and storage limitations, which can then necessitate costly hardware upgrades. The EPA has been exploring potential for leveraging the elasticity of the cloud to overcome some of these limitations in supporting data science work. This effort has involved use of Docker containerization, automation and other approaches on AWS Cloud to dynamically provision data science environments on demand. In Dave's portion of the presentation, he will present on this approach.


About the speakers

Jeff is a landscape ecologist with expertise in the spatial component of ecology and environmental sciences. Since May of 2006, he has worked as a Research Ecologist with the US EPA’s Atlantic Ecology Division in Narragansett, RI. Jeff’s past experience is in applications of geospatial technologies (such as geographic information systems, spatial statistics, and remote sensing) to environmental research and broad scale environmental monitoring, modeling and assessment. A unifying theme in his research is using Open Science (Open Access, Open Source, and Open Data) to benefit environmental science.

Dave Smith is with the Information Access and Analytical Services division of EPA’s Office of Mission Support. He is currently involved in building out a cloud-hosted enterprise data management and analytics platform to bolster data science, machine learning, streaming data analytics and big data needs to support EPA programs and research efforts. Mr. Smith has a background in application development, data management, governance, integration and analysis with over 20 years of experience in working with geospatial technologies, where he worked on a wide variety of computational approaches to support data analysis and design.