Predicting h-index

What is your future impact?

Researchers Acuna, Allesina, and Kording decided to use machine learning to find out. They recently published a Nature article, “Future impact: Predicting scientific success,” that describes their method and findings.

Their goal was to predict a scientist’s future h-index given his or her current bibliographic data. I wrote about discovering the h-index two years ago. Nowadays, Google scholar will calculate this value for you. It’s a measure of research impact, characterized as the number h of your papers that have at least h citations.

Acuna et al. collected data on 3,085 neuroscientists and performed a linear regression on these features:

  • n: number of papers written
  • h: current h-index
  • y: years since publishing first article
  • j: number of distinct journals published in
  • q: number of articles in Nature, Science, Nature Neuroscience, Proceedings of the National Academy of Sciences, and Neuron

They found that this five-factor prediction did better at predicting the future h-index than just using the current h-index itself. Their R2 value for predicting h-index one year into the future was 0.92; five years out, 0.67; and ten years out, 0.48. Their conclusion was that raw h-index numbers were not as predictive as also capturing the scientist’s “breadth” (in j) and the quality of the publication venues (in q).

You can try out their model on your own data, although they note that it is “probably reasonably precise for life scientists, but likely to be less meaningful for the other sciences.” Also, you’ll have to wait the specific number of years to see if it comes true. Or you can plug in your data from a few years ago and see how the predictions match the present. Using my data from two years ago (h-index 12), their system predicts that my h-index this year should reach 19. Google scholar pegs it at 17 right now, so either I am not reaching my proper potential, or their model is wrong. ;)

There’s more than recreational fun going on here. The authors note that h-index values may be used in tenure decisions. In that context, the ability to predict a candidate’s h-index five years into the future could have even more impact—if it were sufficiently reliable. As usual, we can hope that such decisions are made with more than just these impoverished metrics in mind!

Blocking Flash in Chrome

I have the Flashblock add-on for Firefox that allows me to selectively click to allow Flash to run when I want to, and otherwise nothing happens. This makes for a calmer, quieter, more enjoyable web browsing experience. (You can add permanent always-run exceptions for well behaved, regularly visited sites so that clicking-to-flash doesn’t get annoying.)

Lately I’ve started experimenting with Chrome, and I wanted to see if it had a similar add-on. There is a Google Chrome extension called FlashBlock, but it turns out that you don’t need to explicitly add this (or anything). Thanks to these instructions, I learned that all you have to do is:

  1. Type “about:flags” in a new browser window/tab. Scroll down to where it says “Click to play” and click “Enable.” Scroll to the bottom of the page and click “Relaunch browser.”
  2. In your relaunched browser (took me two launches), go to Preferences (Cmd-,), click “Under the Hood,” click “Content Settings,” then scroll down to “Plug-ins” and select the “Click to play” radio button (which is not visible if you haven’t done step 1).
  3. Go to some Flash-y site like youtube.com and test it out. Nice!

Tea-making in action

I recently had the pleasure of seeing tea being made into tea bags, right before my eyes! While in Boulder, CO, for a conference, I stopped by the Celestial Seasonings tea factory. They have not only a wonderful gift shop but also a free tea-tasting bar filled with great art and a free tour of their factory facilities.

After donning a hair net (plus beard net for whiskered men), we entered the factory and got to see black tea being milled (chopped up), filling the air with the most delicious odors. We walked past bales of herbs piled to the ceiling, filled with hibiscus and chamomile and tilia and all sorts of other things. We entered the tea room, where actual tea (black, green, and white) is stored, and then the “world famous” mint room, which of course is filled with mint. It turns out that a room full of mint bales, kept closed 99% of the time, builds up an overpowering mintness. Two feet into the room, my nose started to tingle and then burn faintly. I couldn’t get back out because of the flow of people coming in, so I edged over to the spearmint side of the room since it was less painful than the peppermint side.

Next we entered the main assembly room floor. This was so awesome I’m having trouble putting it in words. It was heaven for any tea-loving geek — like Willy Wonka’s Chocolate Factory, but with tea! Little conveyor belts sent half-assembled boxes of tea zooming around the room, pausing to be folded or stamped or sealed or wrapped in plastic, all by amazing automated machines. I wanted to stop and stare and figure out all of their gears and mechanics, but the tour kept pushing onward. Perhaps most intriguing was their “Robotic Palletizer”, which picked up packed cartons tea boxes in groups of six and stacked them precisely on a pallet. Later I saw the whole pallet being spun so it could be wrapped in plastic, a 6-foot stack of tea cartons all wound up like a cocoon. I could have spent the whole afternoon watching this busy, enchanting process.

Right there at the factory, the various herbs and constituents are magically converted into a lovely beverage experience. They mill, mix, and bag the tea (using unique no-string teabags so as to save frightening amounts of paper), then deposit the bags into boxes that are sealed and sent off for distribution and sale. You can get some glimpses of this geeky awesomeness through the Celestial Seasonings virtual tour; click on the tea cups marked “3” and “4”. Enjoy!

Measuring burnout

Wikipedia has an entry describing burnout, specifically in the work context. (It’s not really relevant how I ended up there. Right.) I learned that there is a “well studied measurement of burnout” called the Maslach Burnout Inventory. It makes use of a “three dimensional description of exhaustion, cynicism, and inefficacy”. Maslach originally characterized burnout for professions such as psychology and social work, in which those experiencing burnout can not only be ineffective but start to view their patients or clients in a depersonalized or dehumanized way. But anyone in any profession can suffer ill effects from reaching unbearable levels of frustration and exhaustion. A later study showed that the MBI had “sufficient fit” as a descriptive/diagnostic tool for various occupations, except for those in advertising (hm?).

While the MBI itself is only available by purchase, you can take a quick self-test to get an idea of your burnout level (if it isn’t already obvious to you). The same site also provides tips on avoiding burnout. The one that resonated most with me was advice to “protect the parts of your job that give you meaning and satisfaction.” When too much of your time is swallowed up in the dreck that provides no satisfaction, but from which you cannot escape, you automatically ratchet up the burnout scale. It’s good to be reminded that taking time to focus on the parts you really enjoy benefits everyone you work with, not just you.

Scientific impact, coarsely measured

Recently at work, a new person we are hiring was described as having a high “h-index”. I had never heard of this term, so I looked it up later. The h-index is short for Hirsch index and was proposed by Jorge E. Hirsch as a method for quantitatively characterizing scientific impact through publications. It is defined as:

A scientist has index h if h of [his/her] Np papers have at least h citations each, and the other (Np – h) papers have at most h citations each.

Intrigued, I went off and calculated my own h-index, which (using citation data from Google scholar) is 12:


According to the wikipedia entry on the h-index, that’s a decent score for use in “tenure decisions”, while getting up to about 18 might rate a full professorship. Of course, this is a coarse metric with (like all other simple metrics) its drawbacks. It doesn’t factor in the number of other authors on the paper, or whether the citations are self-citations, or how the paper is cited (in a substantive manner vs. a member of a long list of work cited in the introduction). But who doesn’t enjoy a moment of quantitative navel-gazing? Calculate away!

Older entries »