For a specific paper, enter the identifier into the top right search box.
Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.
Covers models of computation, complexity classes, structural complexity, complexity tradeoffs, upper and lower bounds. Roughly includes material in ACM Subject Classes F.1 (computation by abstract devices), F.2.3 (tradeoffs among complexity measures), and F.4.3 (formal languages), although some material in formal languages may be more appropriate for Logic in Computer Science. Some material in F.2.1 and F.2.2, may also be appropriate here, but is more likely to have Data Structures and Algorithms as the primary subject area.
Covers applications of computer science to the mathematical modeling of complex systems in the fields of science, engineering, and finance. Papers here are interdisciplinary and applications-oriented, focusing on techniques and tools that enable challenging computational simulations to be performed, for which the use of supercomputers or distributed computing platforms is often required. Includes material in ACM Subject Classes J.2, J.3, and J.4 (economics).
Roughly includes material in ACM Subject Classes I.3.5 and F.2.2.
Covers all theoretical and applied aspects at the intersection of computer science and game theory, including work in mechanism design, learning in games (which may overlap with Learning), foundations of agent modeling in games (which may overlap with Multiagent systems), coordination, specification and formal methods for non-cooperative computational environments. The area also deals with applications of game theory to areas such as electronic commerce.
Covers image processing, computer vision, pattern recognition, and scene understanding. Roughly includes material in ACM Subject Classes I.2.10, I.4, and I.5.
Covers impact of computers on society, computer ethics, information technology and public policy, legal aspects of computing, computers and education. Roughly includes material in ACM Subject Classes K.0, K.2, K.3, K.4, K.5, and K.7.
Covers all areas of cryptography and security including authentication, public key cryptosytems, proof-carrying code, etc. Roughly includes material in ACM Subject Classes D.4.6 and E.3.
Covers data structures and analysis of algorithms. Roughly includes material in ACM Subject Classes E.1, E.2, F.2.1, and F.2.2.
Covers database management, datamining, and data processing. Roughly includes material in ACM Subject Classes E.2, E.5, H.0, H.2, and J.1.
Covers all aspects of the digital library design and document and text creation. Note that there will be some overlap with Information Retrieval (which is a separate subject area). Roughly includes material in ACM Subject Classes H.3.5, H.3.6, H.3.7, I.7.
Covers combinatorics, graph theory, applications of probability. Roughly includes material in ACM Subject Classes G.2 and G.3.
Covers fault-tolerance, distributed algorithms, stabilility, parallel computation, and cluster computing. Roughly includes material in ACM Subject Classes C.1.2, C.1.4, C.2.4, D.1.3, D.4.5, D.4.7, E.1.
Covers approaches to information processing (computing, communication, sensing) and bio-chemical analysis based on alternatives to silicon CMOS-based technologies, such as nanoscale electronic, photonic, spin-based, superconducting, mechanical, bio-chemical and quantum technologies (this list is not exclusive). Topics of interest include (1) building blocks for emerging technologies, their scalability and adoption in larger systems, including integration with traditional technologies, (2) modeling, design and optimization of novel devices and systems, (3) models of computation, algorithm design and programming for emerging technologies.
Covers automata theory, formal language theory, grammars, and combinatorics on words. This roughly corresponds to ACM Subject Classes F.1.1, and F.4.3. Papers dealing with computational complexity should go to cs.CC; papers dealing with logic should go to cs.LO.
Covers introductory material, survey material, predictions of future trends, biographies, and miscellaneous computer-science related material. Roughly includes all of ACM Subject Class A, except it does not include conference proceedings (which will be listed in the appropriate subject area).
Covers all aspects of computer graphics. Roughly includes material in all of ACM Subject Class I.3, except that I.3.5 is is likely to have Computational Geometry as the primary subject area.
Covers systems organization and hardware architecture. Roughly includes material in ACM Subject Classes C.0, C.1, and C.5.
Covers human factors, user interfaces, and collaborative computing. Roughly includes material in ACM Subject Classes H.1.2 and all of H.5, except for H.5.1, which is more likely to have Multimedia as the primary subject area.
Covers indexing, dictionaries, retrieval, content and analysis. Roughly includes material in ACM Subject Classes H.3.0, H.3.1, H.3.2, H.3.3, and H.3.4.
Covers theoretical and experimental aspects of information theory and coding. Includes material in ACM Subject Class E.4 and intersects with H.1.1.
Covers machine learning and computational (PAC) learning. Roughly includes material in ACM Subject Class I.2.6.
Covers all aspects of logic in computer science, including finite model theory, logics of programs, modal logic, and program verification. Programming language semantics should have Programming Languages as the primary subject area. Roughly includes material in ACM Subject Classes D.2.4, F.3.1, F.4.0, F.4.1, and F.4.2; some material in F.4.3 (formal languages) may also be appropriate here, although Computational Complexity is typically the more appropriate subject area.
Roughly includes material in ACM Subject Class G.4.
Covers multiagent systems, distributed artificial intelligence, intelligent agents, coordinated interactions. and practical applications. Roughly covers ACM Subject Class I.2.11.
Roughly includes material in ACM Subject Class H.5.1.
Covers all aspects of computer communication networks, including network architecture and design, network protocols, and internetwork standards (like TCP/IP). Also includes topics, such as web caching, that are directly relevant to Internet architecture and performance. Roughly includes all of ACM Subject Class C.2 except C.2.4, which is more likely to have Distributed, Parallel, and Cluster Computing as the primary subject area.
Covers neural networks, connectionism, genetic algorithms, artificial life, adaptive behavior. Roughly includes some material in ACM Subject Class C.1.3, I.2.6, I.5.
Roughly includes material in ACM Subject Class G.1.
Roughly includes material in ACM Subject Classes D.4.1, D.4.2., D.4.3, D.4.4, D.4.5, D.4.7, and D.4.9.
This is the classification to use for documents that do not fit anywhere else.
Covers performance measurement and evaluation, queueing, and simulation. Roughly includes material in ACM Subject Classes D.4.8 and K.6.2.
Covers programming language semantics, language features, programming approaches (such as object-oriented programming, functional programming, logic programming). Also includes material on compilers oriented towards programming languages; other material on compilers may be more appropriate in Architecture (AR). Roughly includes material in ACM Subject Classes D.1 and D.3.
Roughly includes material in ACM Subject Class I.2.9.
Covers the design, analysis, and modeling of social and information networks, including their applications for on-line information access, communication, and interaction, and their roles as datasets in the exploration of questions in these and other domains, including connections to the social and biological sciences. Analysis and modeling of such networks includes topics in ACM Subject classes F.2, G.2, G.3, H.2, and I.2; applications in computing include topics in H.3, H.4, and H.5; and applications at the interface of computing and other disciplines include topics in J.1--J.7. Papers on computer communication systems and network protocols (e.g. TCP/IP) are generally a closer fit to the Networking and Internet Architecture (cs.NI) category.
Covers design tools, software metrics, testing and debugging, programming environments, etc. Roughly includes material in all of ACM Subject Classes D.2, except that D.2.4 (program verification) should probably have Logics in Computer Science as the primary subject area.
Covers all aspects of computing with sound, and sound as an information channel. Includes models of sound, analysis and synthesis, audio user interfaces, sonification of data, computer music, and sound signal processing. Includes ACM Subject Class H.5.5, and intersects with H.1.2, H.5.1, H.5.2, I.2.7, I.5.4, I.6.3, J.5, K.4.2.
Roughly includes material in ACM Subject Class I.1.
This section includes theoretical and experimental research covering all facets of automatic control systems, having as focal point analysis and design methods using tools of modeling, simulation and optimization. Specific areas of research include nonlinear, distributed, adaptive, stochastic and robust control, hybrid and discrete event systems. Application areas include automotive, aerospace, process control, network control, biological systems, multiagent and cooperative control, sensor networks, control of cyberphysical and energy-related systems, control of computing systems.
Since we recently announced our $10001 Binary Battle to promote applications built on the Mendeley API (now including PLoS as well), I decided to take a look at the data to see what people have to work with. My analysis focused on our second largest discipline, Computer Science. Biological Sciences (my discipline) is the largest, but I started with this one so that I could look at the data with fresh eyes, and also because it’s got some really cool papers to talk about. Here’s what I found:
What I found was a fascinating list of topics, with many of the expected fundamental papers like Shannon’s Theory of Information and the Google paper, a strong showing from Mapreduce and machine learning, but also some interesting hints that augmented reality may be becoming more of an actual reality soon.
The top graph summarizes the overall results of the analysis. This graph shows the Top 10 papers among those who have listed computer science as their discipline and chosen a subdiscipline. The bars are colored according to subdiscipline and the number of readers is shown on the x-axis. The bar graphs for each paper show the distribution of readership levels among subdisciplines. 17 of the 21 CS subdisciplines are represented and the axis scales and color schemes remain constant throughout. Click on any graph to explore it in more detail or to grab the raw data.(NB: A minority of Computer Scientists have listed a subdiscipline. I would encourage everyone to do so.)
1. Latent Dirichlet Allocation (available full-text)
LDA is a means of classifying objects, such as documents, based on their underlying topics. I was surprised to see this paper as number one instead of Shannon’s information theory paper (#7) or the paper describing the concept that became Google (#3). It turns out that interest in this paper is very strong among those who list artificial intelligence as their subdiscipline. In fact, AI researchers contributed the majority of readership to 6 out of the top 10 papers. Presumably, those interested in popular topics such as machine learning list themselves under AI, which explains the strength of this subdiscipline, whereas papers like the Mapreduce one or the Google paper appeal to a broad range of subdisciplines, giving those papers a smaller numbers spread across more subdisciplines. Professor Blei is also a bit of a superstar, so that didn’t hurt. (the irony of a manually-categorized list with an LDA paper at the top has not escaped us)
2. MapReduce : Simplified Data Processing on Large Clusters (available full-text)
It’s no surprise to see this in the Top 10 either, given the huge appeal of this parallelization technique for breaking down huge computations into easily executable and recombinable chunks. The importance of the monolithic “Big Iron” supercomputer has been on the wane for decades. The interesting thing about this paper is that had some of the lowest readership scores of the top papers within a subdiscipline, but folks from across the entire spectrum of computer science are reading it. This is perhaps expected for such a general purpose technique, but given the above it’s strange that there are no AI readers of this paper at all.
3. The Anatomy of a large-scale hypertextual search engine (available full-text)
In this paper, Google founders Sergey Brin and Larry Page discuss how Google was created and how it initially worked. This is another paper that has high readership across a broad swath of disciplines, including AI, but wasn’t dominated by any one discipline. I would expect that the largest share of readers have it in their library mostly out of curiosity rather than direct relevance to their research. It’s a fascinating piece of history related to something that has now become part of our every day lives.
4. Distinctive Image Features from Scale-Invariant Keypoints
This paper was new to me, although I’m sure it’s not new to many of you. This paper describes how to identify objects in a video stream without regard to how near or far away they are or how they’re oriented with respect to the camera. AI again drove the popularity of this paper in large part and to understand why, think “Augmented Reality“. AR is the futuristic idea most familiar to the average sci-fi enthusiast as Terminator-vision. Given the strong interest in the topic, AR could be closer than we think, but we’ll probably use it to layer Groupon deals over shops we pass by instead of building unstoppable fighting machines.
5. Reinforcement Learning: An Introduction (available full-text)
This is another machine learning paper and its presence in the top 10 is primarily due to AI, with a small contribution from folks listing neural networks as their discipline, most likely due to the paper being published in IEEE Transactions on Neural Networks. Reinforcement learning is essentially a technique that borrows from biology, where the behavior of an intelligent agent is is controlled by the amount of positive stimuli, or reinforcement, it receives in an environment where there are many different interacting positive and negative stimuli. This is how we’ll teach the robots behaviors in a human fashion, before they rise up and destroy us.
6. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions (available full-text)
Popular among AI and information retrieval researchers, this paper discusses recommendation algorithms and classifies them into collaborative, content-based, or hybrid. While I wouldn’t call this paper a groundbreaking event of the caliber of the Shannon paper above, I can certainly understand why it makes such a strong showing here. If you’re using Mendeley, you’re using both collaborative and content-based discovery methods!
7. A Mathematical Theory of Communication (available full-text)
Now we’re back to more fundamental papers. I would really have expected this to be at least number 3 or 4, but the strong showing by the AI discipline for the machine learning papers in spots 1, 4, and 5 pushed it down. This paper discusses the theory of sending communications down a noisy channel and demonstrates a few key engineering parameters, such as entropy, which is the range of states of a given communication. It’s one of the more fundamental papers of computer science, founding the field of information theory and enabling the development of the very tubes through which you received this web page you’re reading now. It’s also the first place the word “bit”, short for binary digit, is found in the published literature.
8. The Semantic Web (available full-text)
In The Semantic Web, Tim Berners-Lee, Sir Tim, the inventor of the World Wide Web, describes his vision for the web of the future. Now, 10 years later, it’s fascinating to look back though it and see on which points the web has delivered on its promise and how far away we still remain in so many others. This is different from the other papers above in that it’s a descriptive piece, not primary research as above, but still deserves it’s place in the list and readership will only grow as we get ever closer to his vision.
9. Convex Optimization (available full-text)
This is a very popular book on a widely used optimization technique in signal processing. Convex optimization tries to find the provably optimal solution to an optimization problem, as opposed to a nearby maximum or minimum. While this seems like a highly specialized niche area, it’s of importance to machine learning and AI researchers, so it was able to pull in a nice readership on Mendeley. Professor Boyd has a very popular set of video classes at Stanford on the subject, which probably gave this a little boost, as well. The point here is that print publications aren’t the only way of communicating your ideas. Videos of techniques at SciVee or JoVE or recorded lectures (previously) can really help spread awareness of your research.
10. Object recognition from local scale-invariant features (available in full-text)
This is another paper on the same topic as paper #4, and it’s by the same author. Looking across subdisciplines as we did here, it’s not surprising to see two related papers, of interest to the main driving discipline, appear twice. Adding the readers from this paper to the #4 paper would be enough to put it in the #2 spot, just below the LDA paper.
So what’s the moral of the story? Well, there are a few things to note. First of all, it shows that Mendeley readership data is good enough to reveal both papers of long-standing importance as well as interesting upcoming trends. Fun stuff can be done with this! How about a Mendeley leaderboard? You could grab the number of readers for each paper published by members of your group, and have some friendly competition to see who can get the most readers, month-over-month. Comparing yourself against others in terms of readers per paper could put a big smile on your face, or it could be a gentle nudge to get out to more conferences or maybe record a video of your technique for JoVE or Khan Academy or just Youtube.
Another thing to note is that these results don’t necessarily mean that AI researchers are the most influential researchers or the most numerous, just the best at being accounted for. To make sure you’re counted properly, be sure you list your subdiscipline on your profile, or if you can’t find your exact one, pick the closest one, like the machine learning folks did with the AI subdiscipline. We recognize that almost everyone does interdisciplinary work these days. We’re working on a more flexible discipline assignment system, but for now, just pick your favorite one.
These stats were derived from the entire readership history, so they do reflect a founder effect to some degree. Limiting the analysis to the past 3 months would probably reveal different trends and comparing month-to-month changes could reveal rising stars.
To do this analysis I queried the Mendeley database, analyzed the data using R, and prepared the figures with Tableau Public. A similar analysis can be done dynamically using the Mendeley API. The API returns JSON, which can be imported into R using the fineRJSONIO package from Duncan Temple Lang and Carl Boettiger is implementing the Mendeley API in R. You could also interface with the Google Visualization API to make motion charts showing a dynamic representation of this multi-dimensional data. There’s all kinds of stuff you could do, so go have some fun with it. I know I did.