I am interested in problems related to forecasting ecological changes and using this information to inform economic decision making. In particular, I am interested in rapid state shifts in forest ecosystems, such as disease outbreaks and mass drought die-offs. Rapid changes in ecosystems can have dramatic changes in human well-being, and also provide insight into ecological processes.
I also have an interest in the intersection of ecology and business. In a previous life I was a corporate management and sustainability consultant. There’s a tremendous culture gap between ecologists and businesspeople, but there are opportunities for both through collaboration in markets for ecosystem services and ecological technologies.
Like every year, the GGE will be out in force at the Ecological Society of America’s meeting! If you’re in Baltimore for #ESA100, come and see the many talks by students in the Ecology and Population Biology graduate groups, as well as many other professors and researchers from Davis. Click on any talk title below to go to the full abstract, time and location in the ESA program.
For the past two years I’ve run the UC Davis R Users’ Group (D-RUG). In this post, I’ll (1) outline the structure of D-RUG, and (2) summarize some lessons learned, and (3) discuss how such users’ groups could act to support and complement SWC’s workshops. Per Bill’s suggestion, we could discuss the role of users group at a future instructor hangout
D-RUG was created with similar motivations as SWC – to help scientists learn computing skills. Unlike SWC workshops, our model has been “everyone teaches everyone” rather than “instructor –> learner”. I saw that science graduate students at Davis increasingly needed to use R and similar tools in both their coursework and research, and there was a fair bit of knowledge dispersed among students and faculty in the university. But there was little training on programming, and no forum to share our knowledge.
Our users are primarily graduate students and postdocs from across the university, though they’re skewed towards ecology and related fields. Most have no training in computer science or programming. They take up R either for coursework or when a specific research task demands it. So they are pretty much in the same place as most SWC Workshop students. We try to maintain low barriers to entry for these students.
Our basic operating principle is maintaining low barriers to entry for users of all skill levels. That means meeting students where they are in terms of their tools and needs, and making it as easy as possible for volunteers to pitch in.
Basic elements of the Users’ Group
Work sessions: We have weekly, 2-hour co-work sessions where users come to work on their own projects and get help.
Talks and tutorials: About every other week, the first half of our work sessions is dedicated to a presentation. These are mainly tutorials on tools in R. Sometimes they are “show-and-tell” where a user presents an analysis for feedback. Most speakers walk through a script on a large screen, and Q & A takes up half of the time.
A Q & A listserv: We use Google Groups to manage a listserv where people can ask questions, which is especially helpful for our many users who can’t attend the weekly sessions.
Global fora, like the R mailing lists or Stack Overflow, are great, but in order to scale they need pretty strict rules that are challenging for beginners. Many beginners don’t yet know how to frame questions to get good answers. Since our listserv is small-scale, new users can ask open-ended questions and expect responses more helpful than “RTFM.”
A website and blog: Our blog mostly serves as a way to post materials from our talks, usually Rmd scripts or slides. More recently we have been broadcasting our tutorials with Google Hangouts on Air and posting those videos, as well.
Some lessons learned
After 2 years, D-RUG has proved a successful. A typical meeting brings 5-15 members, usually evenly split between regulars and those who come just when they have questions. Attendance tends to be higher towards the beginning of the term and when we have speakers. A fair number of on-campus (and LOTS of off-campus users) view the tutorials online.400 members have signed up for the listserv. 10-20% of these are active posters, and most questions get answered in a day or so.Most importantly, I’ve seen many users arrive with little or no experience, and go on be our most useful helpers and give some of our best tutorials.
Here a few of the lessons I’ve learned in running D-RUG:
One major challenge is keeping enough advanced users engaged so that they can be a resource for beginners. Advanced users who attend work sessions can expect to spend about 50% of their time answering questions of others.
The listserv is a low-commitment way for advanced users to help, and is even our advanced users ask questions on it sometimes. Our sponsors at Revolution Analytics send us a box of schwag each year – I use these as prizes to recognize helpful volunteers.
Users who started off as beginners but have learned a lot through D-RUG tend to be our most dedicated volunteers.
There’s a trade-off between running D-RUG as a meeting where people regularly give talks, and a space where people come to work. I found a pretty good medium having weekly 2-hr work sessions, with talks every other week for half the session during the school year. More people tend to come to those sessions with talks.
It is constant work wrangling people to give talks, and one doesn’t want to burden advanced users already volunteering their time with too many requests to give tutorials. It’s good to encourage people who are learning a new topic to give talks on things they just learned. Also, encourage them to just walk through a script rather than make slides. It’s less work for the presenter, and the script is more useful to others afterwards than slides.
Users are much more likely to be interested in applications rather than programming topics. Talks on topics like “How to do AIC model comparison” and “How to prepare plots for publication” have been much more well attended (and heavily trafficked on the blog) than ones on topics like “speeding up code.” This is partially a function of our membership (mostly empirical ecologists, biologists, and social scientists).
It takes time to build self-sustaining community. For our first year, I had to monitor the listserv to make sure questions didn’t go unanswered, and I ran every work session and bugged power users to attend. Now the listserv runs itself and we have enough regulars at our work sessions. We would like to find some arrangements to help institutionalize the group more, though. One possibility is having professors or TAs for courses using R to hold office hours at D-RUG.
Food helps. We have snacks paid for by our sponsor. A lot of expert users and professors stop in for a mid-afternoon sugar infusion, and end up answering questions.
Users’ Groups as a complementary tool for SWC
Users’ groups have the potential to be a complementary tool to SWC workshops. We recruited a number of new members at the June SWC Workshop at Davis, and they’ve been able to practice and build those skills at our work sessions. Users’ groups also may be a good forum to stay connected to and follow up with learners.
Beginners need a minimum amount of knowledge to take advantage of D-RUG, which can be provided by SWC workshops. D-RUG doesn’t have the resources or structure to teach those beginners from scratch. I sometimes point new users who show up with no experience to an appropriate online self-taught course, and encourage them do the work at our sessions where they can get support. If these members attend a SWC workshop, they will know enough to get started, and to support each other.
The D-RUG model is fairly easy to replicate. Daniel Hocking at the University of New Hampshire has created a similar group. Anyone with the skill level of a typical SWC workshop helper can probably run a users’ group, though it does require a certain amount of time and skill at recruiting on-campus.
I’d like to hear from others who have similar or alternative models. How can these groups connect with SWC to build an ongoing community of learners?
In my first pass at text analysis of the ESA program, I looked at how the frequency of words used in the ESA program differed from last year to this year. There are much more sophisticated ways at looking at word use in text, though, and I began to dive into the text-mining literature to find other ways to draw insight from ESA abstracts.
One method I found is topic modeling using latent Dirichlet allocation (LDA). Using this method, a number of “topics” are identified, each consisting of groups of words occur together. Documents are treated as linear mixtures of these topics. Carson Sievert recently wrote a great blog post on the ROpenSci blog about this and his package with Kenneth E. Shirley, LDAvis, which produces interactive visualizations of of LDA models.
I had trouble wrapping my head around LDA at first, but on Wednesday at ESA, Dave Harris alerted me to an awesome talk by Denis Valle. Valle showed that LDA can be used to model assemblages of species. In this case, we model communities of co-occurring species, rather than topics of co-occurring words, and rather than documents, we have sample locations. He demonstrated the method on USFS Forest Inventory and Analysis data, showing distinct communities of trees in Eastern U.S. forests. Valle’s talk made LDA much clearer to me, and showed that LDA could be an alternative to other methods of community analysis such as ordination. I’m looking forward to the forthcoming paper.
Back to the ESA program: LDA requires that we specify the number of topics, and it then allocates words among those topics. So I fit a series of models and used AIC to determine the best fit:
The model with the lowest AIC breaks the ESA program into 56 different topics. We can explore this model with an interactive visualization created by LDAvis:
On the left, topics are represented as numbered circles, with the number representing their rank (1 = most common topic), and their size representing relative frequency across all abstracts. They are plotted so that similar topics are clustered together and different topics are farther apart.
Click on any circle and you’ll see a list of the top 30 terms making up that topic on the right. Note that the words are “stemmed” – their suffixes have been removed so as to treat different forms of the same word as a single value. Relative word ranking is a balance between the importance of the word within the topic (red bars) and the frequency of the word across all documents (grey bars). You can adjust this balance using the ‘Lambda’ box on the top-right. Click on any word and you’ll see in what other topics it appears.
Let’s explore! Clicking on Topic 1, we see very general words that might appear in any abstract. Topic 2 seems to be made up of words related to conservation and planning, showing the importance of this subject in the meeting. But click on “cost” in Topic 2 and you’ll see that this term is important in several other topics: 23 (words involving life-history), 36 (plant-water relations), and 47 (behavioral ecology). Topic 3 consists of general words having to do with methods, but nearby, topics 10 and 11 consist of words related to data collection and modeling, respectively.
One interesting observation is that “California” is most frequent in Topic 42, which consists of words related to marine systems, and is also common in 53 (invasions), 38 (fire), and 36 (plant-water relations). I was unsurprised at the last two (especially given fire’s importance in my last analysis), but hadn’t realized how California-specific marine topics would be.
There’s lots more to explore here, though it’s not always easy to interpret. Some topics seem to map well to ecological sub-fields, while others may be driven by concepts and frameworks that are difficult to classify, or even writing style. What patterns do you see that are worth a harder look?
After my last post text-mining ESA Annual Meeting abstracts, Nash Turleywas interested in the presence of the term “natural history” in ESA abstracts. I decided to collect a little more data by including programs back to 2010, giving a five-year data set. Thankfully the program back to 2010 remains in mostly the same format, so it’s easy to pull the data for these additional years.
Now, not all talks that include natural history concepts will include the term “natural history”1 in their abstracts, but it’s frequency may be an indicator of importance, and variation in use of the term is may yield some insights.
First, I look at what fraction of abstracts mention “natural history” in each of the last five years at ESA.
Over the past five years, <1% of abstracts at ESA have mentioned natural history. No trend is evident.
I reported different numbers for 2013 and 2014 in a tweet last week. These were higher because I counted them by a simple search of the number of occurrences of “natural history” in the whole corpus. This included the affiliation fields. Many presenters at ESA work at natural history museums (see this abstract, for instance). The above numbers now only include abstracts where “natural history” was in the title or abstract text, and now I count abstracts, not occurrences of the phrase.
What are these natural history talks about? To examine this, I looked at the word frequency across all five years of abstracts, finding the most frequent terms besides “natural history”.
For comparison, here are the most common terms across all ESA abstracts this year:
Like other abstracts, talks that mention “natural history” have “species” as the most common term. Interestingly, natural history talks don’t use “plant” as frequently – perhaps other terms are used in botanical contexts. Also, there are more relative mentions of “students”, perhaps due to greater links between natural history and education. We can see this pattern in a listing of all the talks for 2014, where we see several talks about education, though there are many fascinating basic science talks, as well:
<a href="http://esa prix du viagra 100mg en pharmacie.org/am/”>ESA is just around the corner, and many of us are gearing up and trying to figure out a schedule to cover all the talks and people we can pack in. ESA is a big conference and there’s far too much for any one person to see. In the end, everyone experiences a different part of the elephant. However, I thought it would be interesting to take a look at the big picture, and examine the ESA program as a whole to see what could be learned from it. This is the first of (maybe) several posts where I use some basic text-mining tools to explore the content of the ESA program.
First, what are the most common terms in the ESA program?
Few surprises here. “Species” would have been my guess for the top. “Plants” are probably on top because ecologists usually refer to animals by various sub-groups. The rest are fairly ho-hum: ecology and science-y words.
It’s more interesting to ask how the topics at ESA change from year to year. Below I show the terms whose use in ESA abstracts changed the most between 2013 to 2014:
This paints a much more interesting picture. The rise of California and the fall of Minnesota make sense given the change in the meeting’s location. But we can see the influence of landscape on topics as well. We see fewer words associated with freshwater ecosystems, prairies, and forests this year, and more associated with fire and other plant systems. Also, we see a difference in the kinds of ecology in the program. This year there are fewer words like “biomass” and “nutrient” – those common in ecosystem ecology – and more like “pollinator”1, “phenology”, and “network” – those associated with the study of species interactions.
It’s possible that these changes are due to changes in what’s popular in ecology, but it is also likely that many of the concepts captured in these terms – ecosystem, community, and landscape ecology – are influenced by region. After all, an ecosystem perspective is likely to dominate in the Midwest, where an abundance of lakes have been important in the research of freshwater nutrient cycling, and a landscape perspective may be important in California, which has such heterogeneity of habitats. This is a pretty good argument for keeping ESA’s location moving, so that no regional perspective dominates every year.
These are the biggest changes, but have the biggest topics changed? The plot below is similar to that above, but instead of plotting the words with the greatest absolute change, I plot the change of the 50 words that are most common across both years:
This is somewhat less clear. If one squints, one could argue that there are more words associated with species interactions, environmental change, and management at the top, and more words associated with forests at the bottom. Words in the middle (“ecology”, “community”) are consistently popular across both years. Finally, perhaps significance is falling out of fashion?
That’s just a quick first pass. I haven’t yet thought much about how one models these data to understand effect sizes and significance. I welcome suggestions for further analyses and better ways to plot/organize this data. Check out this repository on github for the code that generated these plots and how to grab the ESA program text for your own use. See you in a few weeks!
P.S. While messing with the ESA program text, I also created @esa_titles, a twitter account that re-mixes ESA talk titles. Have a look for talks you wish you could see. 🙂
“pollinia” is a stand-in for all pollination-related words here, as I applied stemming to the text.↩
About the Blog
The EGSA Blog features the science, writing, art, and philosophical musings great and small of students in the UC Davis Graduate Group in Ecology.