Data Science in Education Using R is done!
Here’s us (with a few of us a bit in the dark), right around when we clicked “send” on an email with links/attachments for what was a two-year project.
Data Science in Education Using R by the numbers
The project involved a lot of steps, and it is possible, particularly because of the way in which we wrote the book in the open, to describe the process by the numbers.
GitHub
The book was written using which was written in R Markdown; from the start, the files were stored in a GitHub repository for the project.
The first commit, or addition of content, to our GitHub repository was 2018-02-03 (so, just over 26 months ago!)
We made 1,785 additional commits to our repository
Most of these commits were associated with one of 363 pull requests, or collections of commits, many of which were made to address an issue
155 issues were filed to keep track of and address, well, issues
15 individuals made contributions through commits to the repository (thank you!); another five individuals contributed by filing issues
144 individuals “starred” the repository, and 34 individuals “forked” it, to follow along with its progress
The rendered book
We “rendered” (and have continuously updated) the book as a website via bookdown and, to submit to the publisher, as a number of Word documents, also (mostly) via bookdown. Not to dish on Microsoft Word, but I believe it along with GitHub were the only software or tools we used that were not open: R, RStudio, git, bookdown, and the R packages we used are open-source software1.
The book has 20 chapters (check them out: http://datascienceineducation.com/)
Its word count, depending on whether you use the {wordcountaddin} R package, which does not count links to webpages and a few other parts of the text as words, or Microsoft Word, is 77,470 words ({wordcountaddin}) or 76,745 words (Microsoft Word)
There are 8,371 lines of code, which produce 5,257 lines of output and generate 44 figures and two tables that we formatted and exported as such (there are other tables are a part of those 5,257 lines of output)
Slack
We also used Slack to stay in touch.
- We sent 12,809 messages; this includes messages sent by many others in the dataedu Slack group, many who contributed to the book. 5,529 messages were in an #authors channel; another 589 or so were in the #data-science-in-education channel for the book; and 711 were in other channels, including #general and, most of the remaining were messages between individuals
Meeting
We met regularly.
- This is a bit harder to measure, because some of our meetings were not recorded in calendar invitations, and, later, once we setup calendar invitations, we sometimes decided we didn’t need to meet, but … assuming that we met once every three weeks or so, we met roughly 30 times
Data Science in Education Using R beyond the numbers
The above numbers tells a story, but only a part of it, and maybe not the most important part. It is harder to quantify the story of the book: its premise, the challenges we faced, how we overcame them, and how the five of us who wrote it collectively shaped the direction and its nature.
Our work styles, strengths and priorities, and the goals each of us had for the book worked together in a way that led to something that would have not worked, or led to an incomplete story, had any of us been missing.
In this way, neither the above numbers or the story of my experience alone begin to fully capture the process we took over the past two years. On that note, then, thanks Ryan, Emily, Jesse, and Isabella. I cannot imagine a better group of co-authors (and friends) to write this book with; working with folks with the combined substantive and technical expertise and with whom you share a vision is p < .001.
It’s probably worth mentioning that we met and got to know one another through Twitter and then Slack (apart from Emily and I, who knew the other through our graduate program at Michigan State University). It is hard for me to imagine another way through which five authors working in different (for many of us, changing) capacities in education could meet and decide to write an open book using tools that are primarily used in software development (git/GitHub) or writing technical books (bookdown). Doing this in the domain of education made this special, I think.
Dedications
This is in the Dedications section of the book; for my part, the book is dedicated to Katie, Jonah, Teri, Joel, Aaron, and Jess (alongside Ryan, Emily, Jesse, and Isabella’s dedications)
Recognitions
I’d also like to recognize a few folks who - were they to read this - might be surprised: Thank you Andrea Zellner, for opening the door to my use of R, and Tenglong Li and Matt Koehler for encouraging and supporting me to grow as someone who uses R. Thanks to Leigh Graves Wolf for introducing me to the idea of sharing one’s work in the open and why it matters, and to both Leigh and Bodong Chen for exemplifying doing this; a repository Bodong shared nearly seven years ago was the first I encountered by anyone in education, and was a bit of a revelation.
Acknowledgments
I’d also like to echo our acknowledgments to those who contributed to the book in capacities other than as authors. These contributions made the book better in a way that only those coming to the topic from a similar perspective but different expertise can.
Next steps
The book will be copy-edited and then proofed, and, then, will be available in print and e-book format. We’ll have a few things to do, too; we plan to keep working together. In the short term, we want to document some of the technical aspects of rendering the book, especially where it comes to meeting the publisher specifications, now, while they’re fresh in our minds. We’ll continue to edit the book now and going forward - it’s alive! - so, please feel free to make suggestions or edit the book. For now, you can find the book here: https://datascienceineducation.com/. Even after the book is published, the most up-to-date version will always be available there.