Informatics Systems: 2012

Monday, April 9, 2012

Gamification is fun

Two prominent informatics examples of gamification are in bioinformatics. Two groups of researchers have created games users play to help them solve their research problems. The oldest is FoldIt where users "fold" proteins. The user with the highest score wins (and has helped the researchers find the best 3D structure corresponding to a particular protein). So there's crowdsourcing going on here, too.

Another example is Phylo where users help the researchers perform multiple sequence alignments (MSAs). A lot of bioinformatics reduces down to aligning or pairing up symbols that stand for different things (amino acids in protein folding like FoldIt, nucleic acid in Phylo). In fact, the intro to the Phylo game (the onboarding part) is a really good tutorial on sequence alignment. It's easy for the computer to align 2 sequences by matching up symbols to maximize a score. It gets tougher for MSAs b/c the computer needs a good set of heuristics (or simplified rules) to make the problem solvable quickly. People can try to out-do the computer on an MSA since people aren't bound by the heuristics. They can spot patterns or try to solve the problem differently.

Both of these examples are full-blown games that use computing to solve a biology problem. That can be gamification, but gamification doesn't require that something be turned into a full-blown game. It only requires that game elements be incorporated. This last example (I finally got some Freakonomics content into this course) is a blog post illustrating the gamification of identifying objects in pictures. We've seen that computers are not very good at this problem but people are. One approach we saw for solving this was to make it an MTurk HIT and pay someone to do it. Here, we see that just adding a score essentially gets people to solve the problem for free. That's gamification.

(I'll sneak in one last link. Here's how one website gamified itself.)

Monday, March 19, 2012

Data Mining

This post on boingboing describes an effective way Target is using data mining -- to predict who's pregnant or not based on their purchases. The original article appeared in the New York Times and can be found here. The boingboing article describes the data mining as "creepy", while this post at the abbot analytics blog is a little more on the side of the informatics. In either case, Target has mined the data about customers buying habits to figure out with good confidence which customers are pregnant.

Clustering is one of the core parts of data mining. In clustering, a set of points are split into clusters where each cluster presumably contains related data. The article for the bottom picture is a little bit technical, but it describes how to cluster users based on musical preferences.

In classification, given some data, you want to classify that data. So based on its attributes, you want to know which group it goes in. Clustering works with all points and finds groups. Classification works when you know all the groups and want to add a point. They go hand in hand.

Wednesday, February 29, 2012

Tim Berners Lee

Tim Berners Lee invented the Web in some sense, as we know. In this recent Wired article, that is expanded on a bit in the context of courtroom testimony. He is also on the record as saying that perhaps World Wide Web wasn't the best name since the structure of the web is really that of a graph, so the acronym GGG (Giant Global Graph) might be more appropriate. Here's his blog post originating that idea. A graph (also called a network sometimes), is a mathematical model that fits lots of the digital aspects of our lives we take for granted: online friendships, the web itself, viral videos, and on and on...

Here is the Internet shown as a graph:

and here is Facebook

Monday, February 13, 2012

Nasa has unplugged its last mainframe. That's one picture of the direction computing has been headed the last 30 years, but mainframes still exist to server other needs more closely related to enterprise-level data processing. Don't worry, though, NASA still has some big computer clusters.

Two slashdotted articles in a row related to I101 this section: the shift to IP is freeing up real estate for old companies like AT&T, and the importance of supercomputers to weather prediction has been highlighted.

Wednesday, February 8, 2012

Here's a cheesy video that describes how the Internet's backbone made of TCP/IP works:

A timely Wired article describes the importance of API's to the Web. Every wonder how websites all over the place let you share things on FaceBook recently?

Wednesday, February 1, 2012

Cyberinfrastructure

Cyberinfrastructure has become a buzzword for the infrastructure needed for sophisticated information systems. A recent article that got slashdotted describes one aspect of the infrastructure for a Facebook data center: the power. The data center in Oregon uses half the total power of the county, and expanding it involves working with the power company to add additional capacity to their power grid. Why would the power company put up with that? Because the economic value to the local economy is tens of millions of dollars? In terms of Facebook's value, they are rumored to be filing for an IPO today.

The world's fastest computer are listed in the Top 500. These information systems are at the cutting edge of high performance computing (HPC). The current shape that HPC has taken is a direct consequence of Moore's Law and the basic physics that things heat up faster at higher frequencies. As components on a chip get closer together every 18-24 months, they also used to be running faster. By running faster, they heat up more, but if they're too close together, all that heat doesn't have anywhere to escape to in time. So CPUs don't really get faster any more. One of the future entries at or near the top in the top 500 is not too far from here: Blue Waters is finally getting installed at NCSA in Illinois.

Wednesday, January 25, 2012

Ubiquitous Information

Google has made a new "privacy announcement" (outlined here in the Washington Post) stating that users can now expect to have data about their usage of all Google products tracked. Previously, a user's search usage and a user's usage of other Google products would have been tracked separately. Now, Google is stating that the data collected from using the separate Google products (search, GMail, YouTube, ...) will be aggregated to yield a better pool of information. Google is not giving users the ability to opt out of this data collection.

What benefit does this have for Google? What benefit does this have for the user? What drawbacks might it have for both?

Despite sensationalistic articles or headlines, data collection is everywhere; it is ubiquitous. What kind of data might Amazon collect about a user, for instance? When one speaks of ubiquitous computing, though, one usually means the computers and the network being everywhere rather than the data collection. Smart phones are a sign that computers are becoming ubiquitous since one can be online with them almost anywhere or anytime. Another way in which computers can be ubiquitous is in the field of sensor informatics. Sensors collect data, and when integrated with a computer, they can store this data or transmit it over a network. If sensors are ubiquitous, they will be sensors everywhere. What applications of this can you think of? Using ubiquitous hardware to solve problems with this computing power is informatics--sensor informatics. Wired has one sample application which may be viable soon.

Monday, January 23, 2012

Apple for the Teacher

Last Thursday, Apple announced it was making available new textbook creation and consuming tools called Engage and iBooks 2, respectively. By the next day, the Internet was up in arms about one clause restricting the possible uses of the software. What do we think?

Also, if you look at the Tech Blog at the L.A. Times, you'll note in the article after the one about Apple that online advertising it at an all-time high.

Wednesday, January 18, 2012

The Web is Turned Off Today

Taking a quick look at Google or Wikipedia shows us that all is not well on the web today. Other key sites and many smaller one are also deliberately disabled or showing a different logo today in protest.

These protests are in response to the proposed Protect Intellectual Property Act (PIPA) and the Stop Online Piracy Act (SOPA) legislation before Congress. Much of the opposition to this legislation centers on the overwhelming amount of control that the acts give to copyright holders and the possibly unpredictable effects this could have on the development of the Internet.

Leaving aside ethics debates for now, what do we do if we need a wikipedia article today? Well, we can just access it by Google or by directly knowing the URL, for instance since Wikipedia doesn't seem to have shut down there entire site. It's also a good time to talk about how informatics systems catalog the web for us. Google maintains cached copies (which is useful for research, but how useful is it for your online privacy?) and there is something called the Wayback Machine whose very purpose is to catalog how websites change over time. In each of these cases, a vast, vast amount of data in a data warehouse is holding this information until we ask for it. It has also processed this information to make searching over it possible in the first place.

Finally, here is an interview with Conrad Wolfram on the Knowledge Economy. Conrad is the brother of Stephen Wolfram, who created the software Mathematica. Wolfram Research, his company, is also the driving force behind the natural search engine Wolfram Alpha. Informatics centers on the interactions between people and technology, and natural search is a way to attempt to enhance these interactions. Another example is through gesture based interfaces which is the subject of a recent LA Times article and something we'll talk more about later this semester. Note the use of the term disruptive innovation.

Monday, January 9, 2012

Welcome to the Spring 2012 Semester of INFO I-101

This blog will serve as our data warehouse for looking at current events and trends in informatics and computing. A good place to start is to look at an article describing what informatics is written for the Association for Computing Machinery magazine: Why an Informatics Degree by D.P. Groth and J.K. MacKie-Mason. This article may be behind a paywall, but it should work on campus.

Informatics is practiced by informaticists or informaticians (it's up to you which one you like best and want to use), and it is the application of computing and information technology for use in other fields such as biology, chemistry, business, sociology, or psychology, to name just a few. Informatics focuses on three things: the people, the technology, and the information involved. To be a successful informaticist, one must bring together skills from many areas.

One example of an informatics system was in the NY Times yesterday: Medical students are using a 3D informatics system to supplement learning of anatomy, which usually involves dissecting cadavers. This would be an example of medical informatics, but most people use medical or "health informatics" most commonly to refer to issues associated with medical record keeping. In that most commonly-used definition, there is a heavy focus on the information associated with patients. In the example here, informatics is being used to hopefully improve a future doctor's training.

Finally, here is one example of a "feud" between Twitter and Google indicating that information is indeed something valuable.

Search This Blog