No accounting for taste? Giant sloths, ancient pumpkins, and evolutionary genetics in bitter taste receptors.

No accounting for taste? Giant sloths, ancient pumpkins, and evolutionary genetics in bitter taste receptors.

Though domesticated pumpkins and other gourds (think zucchinis, acorn squashes, butternut squashes), are edible (and tasty!), their wild cousins produce a toxic bitter compound, rendering them poisonous to humans, even in small amounts. Anyone who has ever picked a pumpkin and hauled it home might be wondering…why on earth would a plant produce fruit weighing more-than-some-dogs that no one can eat?

Well, its turns out there's an answer. And it involves some Jurassic Park level science (but better, because it's real). Read on, fellow nerds…

Read More

Ask a Neuroscientist: Thinking Beyond the Halle Berry Neuron

Ask a Neuroscientist: Thinking Beyond the Halle Berry Neuron

Is it possible to measure the occurrence of a thought and its corresponding firing neuron - does the thought have to be present in a firing neuron, for it to exist? If so, which comes first - or are they one and the same thing?

These questions cut right to the heart of what many neuroscientists find fascinating about the brain and why we choose to study it. Essentially all neuroscientists believe that thoughts are purely an effect of firing neurons. But which one comes first? And can individual neurons be responsible for individual thoughts? 

Read More

How to diagnose anterograde amnesia

How to diagnose anterograde amnesia

How do we tell when someone is experiencing anterograde amnesia?

Anterograde amnesia, refers to the ability to lay down new memories. Persons with anterograde amnesia may not perceive any symptoms, or they may be profoundly confused and disoriented. Kelly Zalocusky describes the symptoms of anterograde amnesia, and explains the differences between this particular type of memory deficit, and another common form, dementia. 

Read More

NIH announces first round of BRAIN Initiative Awards

NIH announces first round of BRAIN Initiative Awards

Stanford faculty members Mark Schnitzer, associate professor of applied physics and biology, and Michael Lin, assistant professor of pediatrics and bioengineering, were among the first round of BRAIN Initiative awardees announced on September 30. Their project is titled "Protein voltage sensors: kilohertz imaging of neural dynamics in behaving animals".

Read More

Ask a Neuroscientist: Why does the nervous system decussate?

Ask a Neuroscientist: Why does the nervous system decussate?

Our latest question comes from Dr. Sowmiya Priyamvatha, who asks: I've learnt that tracts to and fro from the brain cross. Why should they cross? Is there any evolutionary significance for that? I know left side of the brain controls right and vice versa but why?

Your question is actually hotly debated among evolutionary biologists and neuroscientists. There are, in fact, multiple theories about why tracts cross in the human nervous system. My favorite theory, though, has to do with the evolution of the entire vertebrate lineage. It is called the “somatic twist” hypothesis[i], and it asserts that neural crossings (technically called “decussations”) are the byproduct of a much larger evolutionary change—the switch from having a ventral (belly-side) nerve cord to dorsal (back-side) nerve cord.

Read More

Why most neuroscience findings are false, part II: The correspondents strike back.

hoth.jpg

In my May post for this blog, I wrote about a piece by Stanford professor Dr. John Ioannidis and his colleagues, detailing why, as they put it "small sample size undermines the reliability of neuroscience." [See previous blog post: Why Most Published Neuroscience Findings are False] As you might imagine, Ioannidis's piece ruffled some feathers. In this month's issue of Nature Reviews Neuroscience, the rest of the neuroscience community has its rejoinder.

Here is a brief play-by-play.

Neuroscience needs a theory.

First up: John Ashton of the University of Otago, New Zealand. He argues that increasing the sample size in neuroscience is not the most important problem facing analysis and interpretation of our experiments. In fact, he says, increasing the sample size just encourages hunting around for ever-smaller and ever-less-meaningful effects. With enough samples, any effect, no matter how small, will eventually pass for statistically significant. Instead, he believes neuroscientists should focus on experiments that directly test a theoretical model. We should conduct experiments that have clear, obviously-nullifiable hypotheses and some predictable effect size (based on the theoretical model). Continuing to chase after smaller and smaller effects, without linking them to a larger framework, he argues, will cause neuroscience research to degenerate into "mere stamp collecting" (a phrase he borrows from Ernest Rutherford...who believed that "all science is either physics or stamp collecting".)

Ioannidis and company reply, first by agreeing that having a theoretical framework and a good estimate of effect size would be great, but these ideals are not always possible. They also state that sometimes very small effects are meaningful, as in genome-wide association studies, and that larger sample size will provide a better estimate of those effect sizes.

“Surely God loves the 0.06 nearly as much as the 0.05”

Next up: Peter Bacchetti of the University of California, San Francisco. Like Ashton, Bacchetti believes that small sample size is not the real problem in neuroscience research. He identifies yet another issue in our research practices, however, arguing that the real problem is a blind adherence to the standard of p = 0.05. Dichotomizing experimental findings into successful and unsuccessful bins (read...publishable and basically unpublishable bins) based on this arbitrary cutoff leads to publication bias, misinterpretation of the state of the field, and difficulty generating meaningful meta-analyses (not to mention the terrible incentive placed on scientists to cherry-pick data, experiments, animals, analyses, etc. that “work”).

Ioannidis and colleagues essentially agree, saying that a more reasonable publication model would involve publishing all experiments’ effect sizes with confidence intervals, rather than just p-values. As this "would require a major restructuring of the incentives for publishing papers" and "has not happened," however, Ioannidis and company argue that we should fix a tractable research/analysis problem and do our experiments with a more reasonable sample size.

Mo samples mo problems.

Finally: Philip Quinlan of the University of York, UK. Quinlan cites a paper titled "Ten ironic rules for non-statistical reviewers" to make the argument that small sample size studies really aren't so bad after all. Besides, he says, experiments that require a large sample size are just hunting for very small effects.

Ioannidis and company essentially dismiss Dr. Quinlan entirely. They respond that underpowered studies will necessarily miss effects that are not truly huge. Larger studies allow a more precise estimation of effect size, which is useful whether the effect is large or small, and finally, what constitutes a "meaningful" effect size is often not known in advance. Such an assessment depends entirely on the question and data already at hand.

There you have it, folks! If you have any of your own correspondence, feel free to post it in the comments section.

The Nature Reviews Neuroscience Commentaries

Commentary by John Ashton

Commentary by Peter Bacchetti

Commentary by Philip Quinlan

Response by Button et al.

Of mice and men: on the validity of animal models of psychiatric disease

HomologyCover1.png

As biomedical researchers, we use animal models as a compromise. We hope to understand human disorders and improve human health, but the experiments we do are often too risky for human subjects. One largely unspoken concern about this compromise is the degree to which these animals’ behaviors accurately model the disorder in question. What do we even mean when we say that a particular rodent behavior “models” a human syndrome? And why is it that, very often, treatments that work in animal models fail once they reach the clinical setting (1)?

There is an extensive literature in psychology on the various ways to assess the validity of tests and models (2), and the biomedical research community would do well to consider this long philosophical struggle. But, as a behavioral ecologist and ethologist, there seems to be one potential gold-standard question for animal models that is rarely, if ever, discussed. Are apparent similarities between the human and the animal behavior driven by homology,  or are they analogies, driven  by convergent evolution?

Analogy vs. Homology

As I see it, one major flaw in the design of animal models is in mistaking analogy for homology. That is, neuroscientists often study an animal’s behavior because it resembles an interesting human behavior. Take, for example, mouse models of obsessive-compulsive disorder. The goal is not to understand why some mice groom too much, but instead to understand why some humans wash their hands too much. Mouse grooming is an analogy for hand washing. These studies are only useful, then, if mouse grooming and human hand-washing rely on the same neural circuitry. For these studies to be meaningful, the two behaviors must be homologous.

What does it mean to be homologous?

http://askabiologist.asu.edu/

Homology means evolved from the same ancestral structure or behavior. If, for example, you wanted to understand the structure of bat wings, but could not get the permits to study bats, you could reasonably study bird wings as a model. You could also study human arms, or even whale flippers. The only reason such studies would be useful is that bat wings, bird wings, human arms, and whale flippers have very similar, evolutionarily homologous, structures (see figure). Even though whale flippers are not used for flight (“And the rest, after a sudden wet thud, was silence…”), their structure can tell you a lot about how bat wings are likely put together.

An analogous behavior or structure, on the other hand, is one that looks similar across species but likely occurs for different reasons or through entirely different mechanisms. A bat wing and a butterfly wing are analogous—while they look similar, and evolved to promote the same behavior, they are evolutionarily and structurally distinct. Attempting to learn about the skeletal structure of bats’ wings by studying butterflies would be a largely fruitless endeavor.

The difficulty, of course, in studying psychiatric disease is that most psychiatric diseases are defined by a cluster of symptoms—not by an underlying physiological process. For the researcher, this means that it is challenging to know whether you are studying the right physiological process at all. If a particular assay, based originally on analogy, repeatedly fails to translate in clinical trials—for example, if social behavior assays in mouse autism, or over-grooming in mouse OCD, or refusing to swim in mouse depression repeatedly let clinicians down—perhaps we, as a community, should consider this potential reason why.

Sources

  1. http://www.plosmedicine.org/article/info%3Adoi%2F10.1371%2Fjournal.pmed.1000245
  2. Messick, S. (1989). Meaning and Values in Test Validation: The Science and Ethics of Assessment. Educational Researcher, 18, 5-11.

 

A Tale of Two Papers

A Tale of Two Papers

As behavioral neuroscientists, we hope that our findings generalize beyond the exact conditions of our experiments, and in many cases, beyond the species we choose to study. This is particularly true in labs that study models of psychiatric disease. Recent high profile co-publications on compulsive behavior and on depression, however, call this idea into question. Here I'll discuss these two pairs-of-papers, with an eye toward their implications for generalizability.

Read More

Why most published neuroscience findings are false

KZMay12.png

Stanford Professor Dr. John Ioannidis has made some waves over the last few years. His best-known work is a 2005 paper titled "Why most published research findings are false."(1) It turns out that Ioannidis is not one to mince words.

In the May 2013 issue of Nature Reviews Neuroscience, Ioannidis and colleagues specifically tackle the validity of neuroscience studies (2). This recent paper was more graciously titled "Power failure: why small sample size undermines the reliability of neuroscience," but it very easily could have been called "Why most published neuroscience findings are false."

Since these papers outline a pretty damning analysis of statistical reliability in neuroscience (and biomedical research more generally) I thought they were worth a mention here on the Neuroblog.

Ioannidis and colleagues rely on a measure called Positive Predictive Value or PPV, a metric most commonly used to describe medical tests. PPV is the likelihood that, if a test comes back positive, the result in the real world is actually positive. Let's take the case of a throat swab for a strep infection. The doctors take a swipe from the patient's throat, culture it, and the next day come back with results. There are four possibilities.

  1. The test comes back negative, and the patient is negative (does not have strep). This is known as a "correct rejection".
  2. The test comes back negative, even though the patient is positive (a "miss" or a "false negative").
  3. The test comes back positive, even when the patient is negative (a "false alarm" or a "false positive").
  4. The test correctly detects that a patient has strep throat (a "hit").

In neuroscience research, we hope that every published "positive" finding reflects an actual relationship in the real world (there are no "false alarms"). We know that this is not completely the case. Not every single study ever published will turn out to be true. But Ioannidis makes the argument that these "false alarms" come up much more frequently than we would like to think.

To calculate PPV, you need three other values:

  1. the threshold of significance, or α, usually set at 0.05.
  2. the power of the statistical test. If β is the "false negative" rate of a statistical test, power is 1 - β. To give some intuition--if the power of a test is 0.7, and there are 10 studies done that all are testing non-null effects, the test will only uncover 7 of them. The main result in Ionnadis's paper is an analysis of neuroscience meta-analyses published in 2011. He finds the median statistical power of the papers in these studies to be 0.2. More on that later.
  3. the pre-study odds, or R. R is the prior on any given relationship tested in the field being non-null. In other words, if you had a hat full of little slips of paper, one for every single experiment conducted in the field, and you drew one out, R is the odds that that experiment is looking for a relationship that exists in the real world.

For those who enjoy bar-napkin calculations--those values fit together like this:

$latex PPV = ([1 - \beta] * R) / ([1 - \beta] * R + \alpha) $

Let's get back to our medical test example for a moment. Say you're working in a population where 1 in 5 people actually has strep (R = 0.25). The power of your medical test (1- β) is 0.8, and you want your threshold for significance to be 0.05. Then the test's PPV is (0.8 * 0.25)/ (0.8 * 0.25 + 0.05) = 0.8. This means that 80% of the times that the test claims the patient has strep, this claim will actually be true. If, however, the power of the test were only 0.2, as Ioannidis claims it is broadly across neuroscience, then the PPV drops to 50%. Fully half of the time, the test's results indicate a false positive.

In a clinical setting, epidemiological results frequently give us a reasonable estimate for R. In neuroscience research, this quantity might be wholly unknowable. But, let's start with the intuition of most graduate students in the trenches (ahem...at the benches?)...which is that 90% of experiments we try don't work. And some days, even that feels optimistic. If this intuition is accurate, then only 10% of relationships tested in neuroscience are non-null in the real world.

Using that value, and Ioannidis's finding that the average power in neuroscience is only 20%, we learn that the PPV of neuroscience research, as a whole, is (drumroll........) 30%.

If our intuitions about our research are true, fellow graduate students, then fully 70% of published positive findings are "false positives". This result furthermore assumes no bias, perfect use of statistics, and a complete lack of "many groups" effect. (The "many groups" effect means that many groups might work on the same question. 19 out of 20 find nothing, and the 1 "lucky" group that finds something actually publishes). Meaning—this estimate is likely to be hugely optimistic.

If we keep 20% power in our studies, but want a 50/50 shot of published findings actually holding true, the pre-study odds (R) would have to be 1 in 5.

To move PPV up to 75%, fully 3 in 4 relationships tested in neuroscience would have to be non-null.

1 in 10 might be pervasive grad-student pessimism, but 3 out of 4 is absolutely not the case.

So—how can we, the researchers, make this better? Well, the power of our analyses depends on the test we use, the effect size we measure, and our sample size. Since the tests and the effect sizes are unlikely to change, the most direct answer is to increase our sample sizes. I did some coffee-shop-napkin calculations from Ioannidis’s data to find that the median effect size in the studies included in his analysis is 0.51 (Cohen’s d). For those unfamiliar with Cohen’s d—standard intuition is that 0.2 is a “small” effect, 0.5 is a “medium” effect, and 0.8 constitutes a “large” effect. For those who are familiar with Cohen’s d…I apologize for saying that.

Assuming that the average effect size in neuroscience studies remains unchanged at 0.51, let’s do some intuition building about sample sizes. For demonstration’s sake, we’ll use the power tables for a 2-tailed t-test.

To get a power of 0.2, with an effect size of 0.51, the sample size needs to be 12 per group. This fits well with my intuition of sample sizes in (behavioral) neuroscience, and might actually be a little generous.

To bump our power up to 0.5, we would need an n of 31 per group.

A power of 0.8 would require 60 per group.

My immediate reaction to these numbers is that they seem huge—especially when every additional data point means an additional animal utilized in research. Ioannidis makes the very clear argument, though, that continuing to conduct low-powered research with little positive predictive value is an even bigger waste. I am happy to take all comers in the comments section, at the Oasis, and/or in a later blog post, but I will not be solving this particular dilemma here.

For those actively in the game, you should know that Nature Publishing Group is working to improve this situation (3). Starting next month, all submitting authors will have to go through a checklist, stating how their sample size was chosen, whether power calculations were done given the estimated effect sizes, and whether the data fit the assumptions of the statistics that are used. On their end, in an effort to increase replicability, NPG will be removing all limits on the length of methods sections. Perhaps other prominent publications would do well to follow suit.

Footnotes

1.  Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8): e124. doi:10.1371/journal.pmed.0020124

2. Button et al (2013). Power Failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience 14: 3665-376. doi:10.1038/nrn3475

3. The specific announcement detailing changes in submission guidelines, also the Nature Special on Challenges in Irreproducible Research

 

An Open Letter to Michael Keller

Mr. Keller: As Publisher of both the Stanford University Press and HighWire Press (a division of the Stanford University Libraries), you understand the value of the free and broad dissemination of knowledge.

You must also appreciate the threat that the Research Works Act (HR 3699) poses to the open exchange of ideas. This exchange is central to scientific progress and is the most fundamental means the scientific community has to return the public investment on our research. In limiting access to publicly-funded research, this act stands against the stated mission of both Stanford University Press and HighWire Press, as well as the motto of Stanford University itself.

I therefore urge you to join other respected members of the Association of American Publishers, including AAAS, the MIT Press, and the University of California Press, in publicly stating their opposition to the Research Works Act.

Sincerely, Kelly Zalocusky

PhD Candidate Stanford University Neuroscience Program

-------------------------------------------------------------

In case Mr. Keller is not an avid NeuroBlog reader, I have also sent him the letter directly. I encourage my fellow NeuroBlog readers to do the same. Really, truly--feel free to copy, paste, and send this exact letter. Michael A. Keller can be reached at Michael.Keller@Stanford.edu

Admin Note:

As a colleague and fellow PhD Candidate at Stanford University, I whole-heartedly agree Ms. Zalocusky's sentiments, and I applaude her for speaking out against the Research Works Act (HR 3699). I hope that readers will join with us in contacting both members of the American Association of Publishers, as well as our elected representatives in national government, in protest of the Research Works Act.

Astra Bryant

PhD Candidate Stanford University Neuroscience Program