Getting started with 'open science' through blogging

Through a few different projects and people (such as SIPS and rOpenSci and conversations with friends / colleagues both online and offline), I have been exposed to the idea of open science.

I’m actually going to punt for the moment. Here’s a definition that sounds about right to me:

Open science is the movement to make scientific research, data and dissemination accessible to all levels of an inquiring society, amateur or professional. It encompasses practices such as publishing open research, campaigning for open access, encouraging scientists to practice open notebook science, and generally making it easier to publish and communicate scientific knowledge.

One way I have found valuable for thinking about how to get started with open science is through blogging. I have blogged for more than 10 years (using WordPress up until this year) but have never had as many people mention that they saw or liked something as I have since I started using Blogdown. Blogdown is a package for the statistical software R. It’s main benefit, apart from allowing you to create websites using the platform Hugo - which serves a similar role as a platform such as WordPress (or Weebly or Wix) - to create websites is it allows you to write text inline with R code.

For example, here is a trivial example using a built-in data set (about diamonds) and the ggplot2 R package for creating figures:

library(ggplot2)

ggplot(diamonds, aes(x = carat, y = price, color = cut)) +
    geom_point()

I have spent some time over the past few months blogging this way. Here are two examples:

Comparing MPlus and MCLUST output (to benchmark an approach to carrying out Latent Profile Analysis in R with the same analysis in MPlus)
Using characteristics of rail-trails to predict how they are rated (to illustrate some key ideas behind mixed-effects models)

Writing both text and code in these ways (and as demonstrated in this post) provides a platform, especially for emerging scholars, to share work in-progress and engage with some of the ideas behind open science, namely writing for an audience of both both professional scientists and other interested in the content and practicing notebook science, making it easier for others to use the code to carry out a new analysis or replicate other analyses.

I have a few specific thoughts relate to why some folks have come across my blog who otherwise might not have, particularly by coming across the post I wrote comparing MPlus and MCLUST output. As I wrote at the start of that post, while MPlus is a widely-used tool to carry out Latent Profile Analysis but there does not seem to be a widely-accepted or used way to carry out Latent Profile Analysis in R.

What the post does that I think is especially useful in compare and contrast the output from the two approaches. There is a lot of interest (I’ll speak for educational researchers) in using open-source tools: The two most common data analysis tools (apart from Microsoft Excel) are SPSS and MPlus followed by SAS [and] Stata. Each has its features (and its challenging aspects) but I think R (and Python, too, in slightly different quarters) matches up to and in some cases improves on what each of them offer. Unlike them, it is cross-platform and freely-available, which means both beginning researchers and scholars doing cutting-edge analyses - and students - can use it for a wide range of purposes. I think comparing two approaches - one using open-source and one using proprietary tools - helps to build confidence (for me and I think for others who are considering using the approach) that the approach both does what we think it does and compares to a common approach that uses other software. This is important because at the moment, I think there is not yet wide as wide acceptance of open-source approaches (including R packages, Python libraries, and stand-alone software) as those in proprietary software. This makes sense but it puts the responsibility on those developing open-source approaches to show how the analysis compares to other approaches.

To circle back to the point of this post, blogging provides a place for making this type of work more open - to inquiring minds including others trying to carry out a similar approach, those learning the approach for whom the tutorial-like writing is helpful, and as a place to ask for and receive feedback from experts. In short, blogging, particularly using Blogdown or something else that allows you to write text inline with (R, Python, or maybe even SPSS and MPlus) code is - at least based on one definition of it - a candidate way to get started with open science. And in that spirit - I welcome any questions, particularly if you are looking to get started with Blogdown. There are probably a lot of opportunities for my peers in educational research to continue to hash out what these ideas mean to us working in a field in which the privacy of the data we collect is of paramount importance. Feel free to check out other posts or pages on this site and send me a message at jrosen@msu.edu or reach out and connect on Twitter.

Getting started with ‘open science’ through blogging

2017/10/01