Stochastic Block Models with R: Statistically rigerous clusting with rigorous code - Nick Strayer

January 31, 2020 Nick Strayer
Often a machine learning research project starts with brainstorming, continues to one-off scripts while an idea forms, and finally, a package is written to disseminate the product. In this talk, I will share my experience rethinking this process by spreading the package writing across the whole process. While there are cognitive overheads involved with setting up a package framework, I will argue that these overheads can serve as a scaffolding for not only good code but robust research practices. The result of this experiment is the SBMR package: a native R package written to fit and investigate the results of Bipartite Stochastic Block Models that forms the backbone of my PhD dissertation. By going over the ups and downs of this process, I hope to leave the audience with inspiration for moving the package writing process closer to the start of their projects and melding research and code more closely to improve both.

About the Author

Nick Strayer

I have used R and JavaScript in a variety of positions, including as a Journalist in the graphics department at the New York Times and as a 'Data Artist in Residence' at data visualization startup Conduce. Currently, I am a 5th year PhD Candidate in Biostatistics at Vanderbilt University focusing on network methods.

Visit Website More Content by Nick Strayer
Previous Video
Practical Plumber Patterns - James Blair
Practical Plumber Patterns - James Blair

Plumber is a package that allows R users to create APIs out of R functions. This flexible approach allows R...

Next Video
Data, visualization, and designing AI - Fernanda Viegas & Martin Wattenberg
Data, visualization, and designing AI - Fernanda Viegas & Martin Wattenberg

Recent progress in machine learning has raised a series of urgent questions: How can we train and debug dee...