Monday, March 19, 2007

Release 0.9: Metaweb - Emergent Structure vs. Intelligent Design (10 comments )

This is the essay I was born to write... Or rather, this is the essay I was nurtured to write. I have spent decades looking at structured databases, powerful and rigid, and at text, infinitely malleable and flexible, but not terribly useful without a human to interpret it. [See disclosure at end.]

Long ago, I met Danny Hillis and David Waltz at Thinking Machines, the company Danny founded. It was one of those "AI" companies, focused on large text corpuses with clients such as Dow Jones. I'll always remember David's comment: "Words are not in themselves carriers of meaning, but merely pointers to shared understandings." In other words, the meaning is in people's heads.

Now Danny has a new company, Metaweb, which announced its first product (Freebase) on Friday: The world hasn't changed forever, but Freebase is a milestone in the journey towards representing meaning in computers.

The company calls Freebase a "data commons": The public is invited to add data and even structure.

What is it? It's basically an extensive tool to represent the world in a way that can be understood by computers as well as by people. The excitement is not that it can support better search, but that it can support more powerful applications. Rather than present information to humans so that they can figure out what to do with it, it represents information in a way that lets computers manipulate it.

For example, suppose you want to plan a trip to Moscow (or imagine your own favorite information-intensive task that involves integrating information from several sources, making a few transactions, and ending up with some complex task accomplished). You may search for information about venues and hotels. You will check your schedule to see what appointments you have to plan, and perhaps look at Google or Yandex maps to minimize your travel (and time spent in traffic). But in the end, you don't really want search results: You want to book hotels, schedule appointments, communicate with the people you're going to visit.

All this requires a lot of understanding of how locations, schedules, people, appointments and even expense reporting interact. If you're setting up a meeting with Bernie Sucher at Merrill Lynch, you need to know the location of Merrill's Moscow office. You need to know that it will take three hours to get into town, that you swim every morning so your first appointment can't start until 90 minutes after the opening of the swimming pool. And so on.

This is just one example...but it shows exactly how complicated a simple thing can be. Things are part of other things; they consume or depend on or interact with other things in particular ways. The trick (and Metaweb's goal) is to represent that complexity with enough specificity and precision that a computer can manipulate it. So you can go beyond finding information; you can direct a computer to manipulate it on your behalf.

That's the big win: not better search, but the ability actually to leverage the information and go beyond search to action. In a way, when you do things (as opposed to searching for information) online, you're designing a solution or constructing a complex situation. If the software knows enough to fit the pieces together with some brief instructions from you, that's a big win.

Putting Metaweb in context

At one end, you have the World Wide Web, plain text or even Wikipedia. You can match things on the basis of text strings (including thesauruses) or links. As a human being, you can read and learn from the texts and images. You can also navigate links if you are a person, or, if you 're a computer, you can use the presence of links to select or rank items to be presented to the user (including complicated weightings on the basis of how many links those linking sites get). But in the end, the result of the computer's efforts is a ranking or possibly a set of clusters of items. You don't get any kind of structure or relationship: Is the child father to the man? Does the dog chase the bus or the other way around? You don't know...unless, as a human, you read the text.

At the other end, you can have an Oracle (or MySQL) database with multiple tables. The tables are cross-indexed, so that a table of people could list their friends or their direct reports... but it's extremely hard for a normal person to add new kinds of relationships. And it's clunky and difficult to represent the complexity of the real world.

In the middle is Metaweb, which has the flexibility of text with the power of the database.

To illustrate further, take a car: The text/image approach would simply represent it as a monolithic object - "a car." There could be descriptions of it, tags such as "red" or a price, and photos, but in the end, nothing that would mean much until a human looked at them. A computer could search for prices, but it might confuse the price of the optional Garmin navigator with the price of the car.

By contrast, the database would represent the car as a set of tables (or perhaps some geometrical vectors). Imagine that you had to take your car apart every night and store all its pieces on separate shelves in your garage, and then reassemble it in the morning. That's what working with a database is like.

Meanwhile, Metaweb lets you represent it whole yet still complex: The car is a unified thing, but you can still see the individual parts and how they fit together. You could also specify where they come from and what they cost.

Yes, you could do that in a Wikipedia entry too, of course, or you could add another table in Oracle. The difference in Metaweb is how easy it is to do that in a scalable way... In short, Metaweb has an easy grammar for extending itself.


Reprise: Emergent structure vs. intelligent design

This all reflects a fundamental if still incoherent debate. There's one school of thought that says that if you just collect enough data and throw enough algorithms at it, the inherent structure - and the understanding of that structure - will emerge. After all, that's what happens with human beings, though it takes a decade or more. (And in some people, the process even continues into old age.) The recent explosion of tagging is taken as evidence of this: With their tags, users are creating implicit relationships among online objects, and indeed, complex webs of relationships are emerging, with nodes, clusters and other rich structures. But the relationships themselves are poorly defined, other than strong or weak - and possibly, links made by my friends or by trusted authorities, vs. links created by anyone.

By contrast, the opposing point of view says we have to hand-design the relationships and structures - like the complex database schema about cars.

Where Metaweb differs from that approach - and from "ontology" projects such as Cyc - is this: Metaweb's creators have "intelligently designed" the grammar of how the relationships are specified, but they are relying on the wisdom (or the specific knowledge) and the efforts of the crowd to create the actual content - not just specific data, but specific kinds of relationships between specific things.

Metaweb has a process (including editing and approval) for people both to define relationships, and to use those relationships to describe specific instances.

At least for now, the Metaweb approach is more likely to yield short-term results that look intelligent.

No comments: