Today's academic research is a competitive enterprise, similar in many respects to sports. "Publish or perish" is no joke but hard reality for researchers who have to compete for funding, for publications, for fame. But can we measure their performance and, perhaps, see who "holds the world record"?
Ever since academic research ceased to be a hobby of the blue-blooded, there has been an interest in measuring scientific performance. There are numerous rankings and evaluations of scientific institutions and they play an important role, perhaps too important, in political decision-making. While the "validity" of quantifications of scientific output has been questioned, for good reasons, I still find them inspiring and at times even amusing. They certainly have their own role in this economy. However, to my disappointment, I have not been able to find systematic measurements of my own field, the field of human-computer interaction (HCI), quite likely because the field is so young. (See also this.) In the spirit of the science=athletics equation, I was interested to find out what is the "world record" there and who holds it? Who are the athletes and how good are they? If a new athlete enters this arena, what are his/her changes of success?
To satisfy my curiosity, I needed to produce some measurements, and I decided to start from a particular scientific conference, the CHI. The annual ACM Conference on Human Factors in Computing Systems, or CHI for short, if any event, corresponds to the 100 metres sprint in HCI. It is undoubtedly the most competitive conference with the broadest attendance. To "measure CHI", I made a small bot that crawled the ACM Digital Library on 6th March 2007, starting from the Proceedings of CHI 2006. My humble bot extracted the title, the first author, the number of citings (if any), and some other relevant pieces of information. The key measure of performance is thus here, in the absence of alternatives, the number of citings in the ACM Digital Library.
Before moving on, please take a minute to read a few words about the data set. First, the proceedings span years from 1990 to 2006, because when the bot was supposed to move from CHI'90 to CHI'89, the ACM server blacklisted my IP address for repeated downloads :( This does look like a DoS attack, after all. In addition, the server could not always respond to the queries; I calculated that some 20-30 individual publications, and the whole year of 1991 (!), and the full paper procs of 1992, are missing due to Server busy errors. Moreover, the Digital Library does not always report exact citings number if there are three or less citations, which means that there are more 0 citation papers in the data than in reality. Moreover, self-citations were not removed. Please take these shortcomings into account when interpreting the results. I intend to revise my code and complete the crawl if people find these statistics useful.
1. The most cited first authors 1990-2006
If the number of citings is the measure, the following people are the ten most successful people in the history of CHI since 1990. The list contains no surprises, all of these researchers have not only been researching a topic, but have been establishing important new areas of research.
Note: Some authors have used two or more different versions of his/her name and may therefore not show as high on the list as he/she should. William Gaver / Bill Gaver / William W. Gaver is a perfect example. I did not have time to go through the whole list of almost 3000 authors to fix this issue.
|First author|| |
|Hiroshi Ishii|| |
|Upendra Shardanand|| |
|Peter Pirolli|| |
|Steve Whittaker|| |
|Jun Rekimoto|| |
|John Lamping|| |
|George W. Fitzmaurice|| |
|I. Scott MacKenzie|| |
|Sharon Oviatt|| |
|Steve Benford|| |
If you're interested, you can find the list ALL first authors from here. 2781 authors altogether!
There's a Most frequent authors list available at Gary Perlman's HCI Bibliography. HCIBib's list is perhaps more comprehensive in the sense that it covers more publications and from a longer period of time. However, the list is not based on citings, but on the total number of publications. These differences explain why the two lists seem so distant.
2. The most influential sites of research
Jakob Nielsen evaluated in his Alertbox article that the best HCI labs in the 1990s were
Gold: Bell Communications Research (Bellcore)
Silver: Apple Computer Advanced Technology Group
Bronze: Xerox PARC
and in 2000-2010 are going to be
Gold: Microsoft Research
Silver: Xerox PARC
Bronze: Carnegie Mellon University.
According to my bot, the best sites in 1990-2006 have been:
|2.||MIT Media Lab||849|
Thus, the "hard facts" disagree with Nielsen's opinions. Xerox is no doubt the most important site, particularly so in the 1990s. And MIT Media should be included as the "silver medalist".
Disclaimer: For the time being, these were calculated by merging the affiliations that referred to the same institute only in the list of the top 30 affiliations. By doing this I hoped to address the fact that the naming conventions and the names of the institutes have changed over the years. The list could look different if this was done properly. I am tempted to re-do this in the near future.
And an extract from the stats that might interest us Finns:
- Helsinki University of Technology (incl. my institute HIIT) - 50 citations
- Nokia Research Center - 32
- University of Tampere - 18
- University of Jyväskylä - 13
- University of Helsinki - 2.*
* University of Oulu's CHI paper from 1993 also has 2 citings, but due to the limitations of the ACM DL and my bot, it was not shown in this data although UH's was.
3. The most productive authors
Who have contributed the most pages for the CHI audience to enjoy? Here's a table of the TOP 10 paper machines.
|First author||Total page numbers|
|Brad A. Myers||66|
|Michael J. Muller||52|
|William W. Gaver||49|
|Suresh K. Bhavnani||44|
If you compare this list to list no. 1, you see that the two lists overlap partially. Most of the top researchers of list no. 1 have published many important articles over the years, and by that way have secured their claim for fame. However, the high ranking of the two most highly cited authors, Ishii and Shardanand, seems to be established on only one "killer" publication.
4. The most cited papers
Okay, numbers are nice, but what kinds of papers have been influential? The following table should be indicative, if any statistic, of the most significant findings made in this area.
|Tangible bits||Hiroshi Ishii||1997||313|
|Social information filtering||Upendra Shardanand||1995||187|
|A focus+context technique based on hyperbolic geometry for visualizing large hierarchies||John Lamping||1995||126|
|Bricks||George W. Fitzmaurice||1995||111|
|Email overload||Steve Whittaker||1996||92|
|The WebBook and the Web Forager||Stuart K. Card||1996||92|
|i-Land||Norbert A. Streitz||1996||87|
|Virtual reality on a WIM||Richard Stoakley||1995||85|
|Augmented surfaces||Jun Rekimoto||1999||78|
|Recommending and evaluating choices in a virtual community of use||Will Hill||1995||77|
Fine, what can one conclude about the most significant contributions to HCI, based on this list? I believe three kinds of contributions have been more important than others:
- Presentation of a novel interaction paradigm (e.g., Tangible bits, Augmented surfaces, Focus+Context)
- Presentation of a novel paradigm for computer-mediated human-human interaction
- Identification of important HCI phenomena in widely used technologies (Email overload)
5. Expected citings
Assuming that you're lucky enough to get a paper pubbed at CHI, how probably is it going to be a killer paper like the ones listed above?
First of all, it depends on the type of the paper. In my data there are 4151 papers, of which 1214 are full papers. The average number of citations per paper is quite different in the two categories:
- The average number of citings for full papers is 8.4
- The average number of citings for all papers (incl. full papers) is 3.2
This measure should not be confused with Journal Impact Factor, as it is calculated differently (see this link).
This is of course an average that tells nothing about the distribution. It turns out that a majority, a whopping 61% of full papers, get less than 6 citations. See this table:
|1 - 5||218||18|
|6 - 10||172||14|
|11 - 20||156||13|
|21 - 50||127||10|
|51 or more||22||2|
|Total full papers||1214||100|
The 20/80 rule by Garfield (guy who invented Impact Factor in 1959) seems to apply here as well. According to this rule, the top 20% of papers get 80% of citings. I calculated that in CHI the top 21% of papers receive 76% of citings. Pretty close.
Moreover, the top 5% of papers have "robbed" amazing 38% of all citations to full papers.
- CHI is the premier forum of HCI research. Full papers receive on average 8 citings in my data, which is a respectable number.
- CHI has been, and it still is, an enterprise led by a small band of American universities and IT companies. The gold medal goes to Xerox PARC. The best individual "athlete" is Hiroshi Ishii from MIT.
- The 20/80 rule seems to hold here.
- The law of important articles seems to hold in CHI. According to the law, the number of important articles is the square root of the number of all articles, which in this case would yield 55 “important” full papers. These 55 top full papers (4% of all) received one third of all citations to full papers. Interestingly, according to the same law, the number of “revolutionary” articles is the logarithm of the total number of articles, which in this case would be 3. Ishii’s Tangible Bits is no doubt one of the most important papers in the history of CHI and therefore of all HCI. What would be the other "landmark" articles?
- “The best of CHI” is dominated by constructive contributions. The most influential papers seem to be of two types: 1) a novel interaction paradigm/technique or 2) a novel paradigm to support human-human interaction. The most cited of such papers do not necessarily need to be backed up by rigorous empirical evaluation, because the technology itself has been so powerful a demonstrator. All of such papers have opened new vistas for research and founded a basis for continuity and development in research. Papers providing an empirical contribution, e.g., powerful observations about the use of a powerful technology like email, are salient but in the minority. I believe it would be beneficial to find a better balance between constructive and empirical papers.
- Papers that attempt to develop the theoretical foundation of HCI are not visible. Most research efforts are technology-centered, making it difficult to "see the forest from the trees." As John M. Carroll said in his 1997 Annual Review of Psychology article, HCI needs more integrative work to systematize research efforts.
- This little exercise shows that the scientometrics of HCI, if it is ever to be conducted seriously, will be challenging.