1 Thinking like a Modeler

1.1 How to approach this course

Learning how to code and model is a lot like learning how to cook in many ways. You can be pretty bad at it and still do it, it takes a lot of time to get good at it, and there are many, many different ways to be good at it (and a whole lot of different and often exciting or truly horrendous ways to be bad at it!).

1. It is not about content mastery. This course likely requires a different mindset from what you have encountered previously, primarily because it is not about content mastery. To be a specialist in any field, including biology, there is a lot of content that you need to have under your belt, which is why many undergraduate (and graduate) courses emphasize understanding principles and knowing certain key information (e.g. you need to know population dynamics, cell division, anatomy and physiology, evolutionary principles, etc.). In contrast, this course is much more about building your toolset and learning techniques, then recombining them in new ways when faced with different modeling problems. In many ways, learning how to analyze your data, how to code, how to specify models, etc. is like learning how to cook.

Think of content mastery as foraging - if you are in the woods collecting mushrooms, you must know which ones are edible and which are poisonous and how to tell the difference. Knowing what you can and cannot eat is essential background information. Modeling, on the other hand, is more like cooking - what are you going to do with what you’ve gathered?

2. It is better to know of many methods than to know a few in-depth.

We have 84 hours together this semester. With that time, I could teach you one specific model each week and we could go very in-depth to know all the details, assumptions, modifications, etc. To put it in a cooking context, I could teach you one recipe each week and by the end of the course you would know how to make 14 of my favorite dishes. Or, instead, you could be exposed to many different types of food that you might like to make depending on what you’re in the mood for, covering a variety of cooking styles, techniques, and cuisines. Knowing that these dishes exist and having a general sense of what you want to make gives you many more options than just knowing how I make guacamole.

3. Not all methods are suitable for your research questions.

Even knowing that there are many dishes you could make, not all of them will suit you. Some weeks the dishes won’t work for your dietary needs (e.g. maybe you never have nested data) or you will wonder why anyone would ever want to cook French food. Then, we will get to Mexican food the next week (er, I mean Principal Components Analysis) and it will be exactly what you were looking for. Even on weeks that are not currently relevant to your research, it is still good to know that they exist and you may encounter those methods in other ways and be glad you have a basic understanding (e.g. as a peer reviewer, or food critic).

4. You can always look up how to implement a method.

If you know a dish exists but you are not entirely sure how to make it, you can look up a recipe and follow along. The same is true of models. Some recipes are not exactly what you want, but if you know what you’re looking for, eventually you’ll find a good example, guide, or tutorial. And sometimes, you may go to a restaurant (i.e. a journal article) and taste something you really like it and try to recreate it!

5. Practicing is important.

The first time you try a new recipe, it may not go well, but you’ll get better with practice. And at least you tried! With coding, you will need to keep at it and you will learn new tricks the more you practice.

6. Selecting methods a priori is better than post-hoc.

Sometimes, you have been handed ingredients and are trying to find a recipe that will work with what you have. For example, in a data context, your PI may give you a dataset, you may be working with historical data, or you collected the data before having a plan for analysis. In an ideal world, you know what you are going to cook in advance so you have all the ingredients that you need on hand.

1.2 Shifting how you think

Becoming ‘quanty’ requires a shift in how you think about the world. For some of us, it is a very intuitive way to approach things. For others, it may be a bit dissimilar to how you usually approach things. There are a few general mindset shifts that I think can be helpful as you embark on your journey of becoming better at quantitative reasoning, modeling, and coding.

1) You are learning a new language The best predictor of being good at coding is not experience with math, computers, or anything techy - it is being good at learning new languages. I think this applies not just to the coding portion of the course, but really everything with thinking like a modeler. What modelers do, routinely, is translate from one language to another. Models can be described verbally. Let’s go on a throwback to high school physics: Newton’s Laws of Motion. Newton’s Second Law says that the acceleration of any object depends on how much mass it has, and how much force is applied to it in a direction. This statement can be translated into an equation as \(F=ma\). We could also represent it graphically.

It is important to get comfortable moving between these three modes of thinking about models: verbal or conceptual descriptions, equations, and graphical models. It is also okay to only be comfortable with one of those modes, and to get better as time goes on at expressing models in the other methods. For instance, right now you may be most comfortable with a verbal model describing how you think the world works, and unfamiliar or even intimidated by the others. That’s okay! We’ll get there.

You could also add a fourth mode (code), but I would actually caution against that. Many people become too reliant on expressing models in code, which makes them irreproducible in the long run and less easy to translate. Code is ephemeral. Package versions change, languages change, and there are

2) Crappy sketches are lifesavers

One of the best ways to change how you think about models and put yourself in a quantitative mindset is to sketch out some rough ideas. They don’t need to be pretty, they just need to convey what you’re thinking. Whiteboards, a legal pad, a pile of scratch paper, a notebook, or the back of an envelope are all helpful things to keep handy for explaining a model.

A series of scatterplots can go a long, long way in terms of walking through your thought process and forces you to carefully explain the relationship that you predict to exist between two or more variables, which can help you understand your own thinking about how the world works better. We will talk about this (how the world works) more in the next lecture on hypothesis framing. For now, let’s just think about representing our ideas in terms of sketches. Think of these sketches as a storyboard of your hypotheses. Each individual component tells you a little bit more about the overall model even if each is an incomplete bit.

You can tell what a sketch is meant to represent even from pretty incomplete drawings. It is kind of like Pictionary. Similarly, you can tell what a model is even from a really rough representation. Arrows in conceptual models can be especially useful to demonstrate how you are thinking about the causal relationship between variables or what you think the true underlying ecological process is, even if you are not adopting a causal modeling framework.

How to start a sketch: 1) Draw an x-axis and a y-axis. 2) Think about what your response variable is - what are you interested in knowing about and how it changes? Label that as your y-axis. 3) What do you think causes changes in your response variable? Label that on your x-axis. 4) Ask yourself, at low values of X, what do I expect to be the relative level of Y? Place a point there. 5) Ask yourself, at high values of X, what do I expect to be the relative level of Y? These don’t need to be precise estimates - just higher, lower, a lot higher, etc. Place a point there. 6) Continue this process for a few intermediate points, especially if you think there may be non-linear relationships. Keep track of any thoughts you have like ’well, it depends on Z…” because that might suggest you’ve got some moderating variables and you might need to introduce a second line, perhaps with a different color. 7) Step back and look at your graph. Does it pass the gut check? If not, maybe revise it. 8) Criticize your graph and question the key assumptions. Under what scenarios does it break down? Are there situations you aren’t considering? When is this relationship not true? Is it generally true though? Are there alternatives you aren’t considering? 9) Add lines or colors to your graph representing hypothesized relationships under different conditions, with different sets of moderator variables, etc. 10) If there are alternative explanations, sketch them out as well. Keep track of which hypotheses go with which graph in your mind (i.e., what is your explanation for why you might see the pattern you predict to observe).

3) Being vague-ish helps

General patterns are far, far more important than details. What we are interested in from a modeling standpoint is the patterns we can explain. We are rarely interested in the exception to the rule, because it is extremely difficult to model or predict the exceptions. Instead, we can model the rule, and then those exceptions become interesting challenges either to help us better understand the assumptions we have baked into our model, or to discover some truly interesting ecological phenomena we don’t yet understand. In other words, the exceptions and outliers are interesting and let us ask why, but, when we are developing a model, we should aim to fit the general pattern and not encompass absolutely everything.

Ignore the small things. Simplifying, sometimes to the extreme of oversimplification, is the best place to start with a model. What are the core things that are most important to include? What is irrelevant or can be ignored, at least for now, to be able to focus on the big picture changes? If you can explain your model to a toddler, you’re on the right track. I like to keep the Up Goer Five in mind as a good model of how a rocket ship works, but in toddler terms. You can even try testing out how you describe your model to make sure it is simple enough with the Simple Writer.

4) You’re wrong most of the time

When you’re modeling, intuition goes a long way, but be prepared to justify it. You can of course start from first principles and work your way up to what you suppose to be true, but you can also start from what you suppose to be true and then work backwards from there to figure out your own underlying assumptions. You can then also work from those core assumptions to see if there are other possibilities.

You should also be comfortable being good enough. Not great, not perfect, but good enough. Don’t let the perfect be the enemy of the good. If you get like 80% of the way there with a model, either in terms of how much it explains, how satisfied you are with it, or any other metric, that’s pretty good! Much better than 0% or even 10% of the way there. Nothing has to get to 100% because let’s face it, you’re always going to be wrong in some ways and that’s perfectly okay.

5) Rules are meant to be broken

There are a lot of ‘rules’ to follow in statistics, but (and again, stealing this from Quantitude!) similar to the pirate’s code, “they’re more like guidelines” anyway. Just because something has always been done one way, or is done commonly in the literature, doesn’t mean it is the right way to do something. Just because something hasn’t been done also doesn’t mean that it can’t or shouldn’t be done. So hang the rules, and do what makes biological sense.

6) Question everything

We make a lot of assumptions in modeling. It is very important to question those assumptions. Once you’ve expressed a model (verbally, graphically, mathematically, in code, or elsewise), ask yourself under what conditions would we expect this model to hold up? When would we not see this pattern? What happens at zero? Under what conditions does your model fail? If it fails, does it fail correctly? Does it fail when it should?

Be your own devils advocate and question your model structure, the data generating process, and the assumptions baked into your model. Can you answer why you think something should behave in the way you expect? What would happen if one of your assumptions were not true? What if there was no gravity? If we changed one part, what would happen?

Also remember that there could be nearly infinite different models, and you’re likely only considering a few of them. Keep all the possible models in mind, and consider if any of them are better representations of what you’re trying to represent. Don’t put all your eggs in one basket and fully convince yourself that you’ve developed the best model, because let’s face it, your model is definitely wrong, and maybe there are better models. Be okay with being wrong, and interrogate yourself and your model as you do frequent reality checks.

7) Picture yourself as a fish

The single most important “rule” (keeping in mind we just tossed rules out the window) in ecological modeling is that your models must be biologically relevant. Start with the natural history of the system. What do you know about the species, populations, or communities you are modeling? What can and cannot be true in those systems? Don’t let the models lead your thinking, force the models to help you answer the questions you want to answer. Similarly, start with the biological question you want to answer or hypothesis you want to address, not what the data are or what they will allow you to do.

I say ‘picture yourself as a fish’ because I think it can really help sometimes to anthropomorphize the problem. If you’re trying to figure out how a model should work, draw what you expect or predict to observe, and think about your underlying hypotheses, it can sometimes help to insert yourself into the scenario. Keep the ecological and evolutionary context in mind.

1.3 The end goal

The goal of this course is to get you to think quantitatively i.e. recognizing that data are generated by deterministic and stochastic processes, both of which are defined by you (or your understanding of the biology of the system that you are modeling or making inference on). No one can really tell you how to model or analyze your data - there is no lookup table for ecological modeling.

After this course, you should know enough to 1) be dangerous on Google / StackOverflow to find analytical / modeling approaches that match your questions and data, and 2) write or adapt code to implement those models and not be reliant on a GUI. The goal is not complete mastery of statistics, modeling, or data analysis.

The goal of this course is to get you closer to becoming dangerous.