Skip navigation
Help

Graph theory

warning: Creating default object from empty value in /var/www/vhosts/sayforward.com/subdomains/recorder/httpdocs/modules/taxonomy/taxonomy.pages.inc on line 33.

This is a cross post from my personal development blog , about a day or two spent jamming on path finding.

I enjoy messing with path finding algorithms and finding interesting ways to obtain the results, this is about a few more recent attempts.

0
Your rating: None

Faced with the need to generate ever-greater insight and end-user value, some of the world’s most innovative companies — Google, Facebook, Twitter, Adobe and American Express among them — have turned to graph technologies to tackle the complexity at the heart of their data.

To understand how graphs address data complexity, we need first to understand the nature of the complexity itself. In practical terms, data gets more complex as it gets bigger, more semi-structured, and more densely connected.

We all know about big data. The volume of net new data being created each year is growing exponentially — a trend that is set to continue for the foreseeable future. But increased volume isn’t the only force we have to contend with today: On top of this staggering growth in the volume of data, we are also seeing an increase in both the amount of semi-structure and the degree of connectedness present in that data.

Semi-Structure

Semi-structured data is messy data: data that doesn’t fit into a uniform, one-size-fits-all, rigid relational schema. It is characterized by the presence of sparse tables and lots of null checking logic — all of it necessary to produce a solution that is fast enough and flexible enough to deal with the vagaries of real world data.

Increased semi-structure, then, is another force with which we have to contend, besides increased data volume. As data volumes grow, we trade insight for uniformity; the more data we gather about a group of entities, the more that data is likely to be semi-structured.

Connectedness

But insight and end-user value do not simply result from ramping up volume and variation in our data. Many of the more important questions we want to ask of our data require us to understand how things are connected. Insight depends on us understanding the relationships between entities — and often, the quality of those relationships.

Here are some examples, taken from different domains, of the kinds of important questions we ask of our data:

  • Which friends and colleagues do we have in common?
  • What’s the quickest route between two stations on the metro?
  • What do you recommend I buy based on my previous purchases?
  • Which products, services and subscriptions do I have permission to access and modify? Conversely, given this particular subscription, who can modify or cancel it?
  • What’s the most efficient means of delivering a parcel from A to B?
  • Who has been fraudulently claiming benefits?
  • Who owns all the debt? Who is most at risk of poisoning the financial markets?

To answer each of these questions, we need to understand how the entities in our domain are connected. In other words, these are graph problems.

Why are these graph problems? Because graphs are the best abstraction we have for modeling and querying connectedness. Moreover, the malleability of the graph structure makes it ideal for creating high-fidelity representations of a semi-structured domain. Traditionally relegated to the more obscure applications of computer science, graph data models are today proving to be a powerful way of modeling and interrogating a wide range of common use cases. Put simply, graphs are everywhere.

Graph Databases

Today, if you’ve got a graph data problem, you can tackle it using a graph database — an online transactional system that allows you to store, manage and query your data in the form of a graph. A graph database enables you to represent any kind of data in a highly accessible, elegant way using nodes and relationships, both of which may host properties:

  • Nodes are containers for properties, which are key-value pairs that capture an entity’s attributes. In a graph model of a domain, nodes tend to be used to represent the things in the domain. The connections between these things are expressed using relationships.
  • A relationship has a name and a direction, which together lend semantic clarity and context to the nodes connected by the relationship. Like nodes, relationships can also contain properties: Attaching one or more properties to a relationship allows us to weight that relationship, or describe its quality, or otherwise qualify its applicability for a particular query.

The key thing about such a model is that it makes relations first-class citizens of the data, rather than treating them as metadata. As real data points, they can be queried and understood in their variety, weight and quality: Important capabilities in a world of increasing connectedness.

Graph Databases in Practice

Today, the most innovative organizations are leveraging graph databases as a way to solve the challenges around their connected data. These include major names such as Google, Facebook, Twitter, Adobe and American Express. Graph databases are also being used by organizations in a range of fields including finance, education, web, ISV and telecom and data communications.

The following examples offer use case scenarios of graph databases in practice.

  • Adobe Systems currently leverages a graph database to provide social capabilities to its Creative Cloud — a new array of services to media enthusiasts and professionals. A graph offers clear advantages in capturing Adobe’s rich data model fully, while still allowing for high performance queries that range from simple reads to advanced analytics. It also enables Adobe to store large amounts of connected data across three continents, all while maintaining high query performance.
  • Europe’s No. 1 professional network, Viadeo, has integrated a graph database to store all of its users and relationships. Viadeo currently has 40 million professionals in its network and requires a solution that is easy to use and capable of handling major expansion. Upon integrating a graph model, Viadeo has accelerated its system performance by more than 200 percent.
  • Telenor Group is one of the top ten wireless Telco companies in the world, and uses a graph database to manage its customer organizational structures. The ability to model and query complex data such as customer and account structures with high performance has proven to be critical to Telenor’s ongoing success.

An access control graph. Telenor uses a similar data model to manage products and subscriptions.

An access control graph. Telenor uses a similar data model to manage products and subscriptions.

  • Deutsche Telekom leverages a graph database for its highly scalable social soccer fan website attracting tens of thousands of visitors during each soccer match, where it provides painless data modeling, seamless data model extendibility, and high performance and reliability.
  • Squidoo is the popular social publishing platform where users share their passions. They recently created a product called Postcards, which are single-page, beautifully designed recommendations of books, movies, music albums, quotes and other products and media types. A graph database ensures that users have an awesome experience as it provides a primary data store for the Postcards taxonomy and the recommendation engine for what people should be doing next.

Such examples prove the pervasiveness of connections within data and the power of a graph model to optimally map relationships. A graph database allows you to further query and analyze such connections to provide greater insight and end-user value. In short, graphs are poised to deliver true competitive advantage by offering deeper perspective into data as well as a new framework to power today’s revolutionary applications.

A New Way of Thinking

Graphs are a new way of thinking for explicitly modeling the factors that make today’s big data so complex: Semi-structure and connectedness. As more and more organizations recognize the value of modeling data with a graph, they are turning to the use of graph databases to extend this powerful modeling capability to the storage and querying of complex, densely connected structures. The result is the opening up of new opportunities for generating critical insight and end-user value, which can make all the difference in keeping up with today’s competitive business environment.

Emil is the founder of the Neo4j open source graph database project, which is the most widely deployed graph database in the world. As a life-long compulsive programmer who started his first free software project in 1994, Emil has with horror witnessed his recent degradation into a VC-backed powerpoint engineer. As the CEO of Neo4j’s commercial sponsor Neo Technology, Emil is now mainly focused on spreading the word about the powers of graphs and preaching the demise of tabular solutions everywhere. Emil presents regularly at conferences such as JAOO, JavaOne, QCon and OSCON.

0
Your rating: None

For this homework you will implement a simple distributed hill-climbing algorithm and test its behaviors on various graphs. The goal is for you to become proficient at using the NetLogo link primitives and to gain first-hand experience with the problems and benefits of distributed hill-climbing algorithms.

  1. Read the link primitives section of the NetLogo manual.
  2. Implement a NetLogo model that generates a random graph, by using a simple algorithm: first create num-nodes nodes (a slider), then create num-nodes * edge-ratio edges, each one connected to two randomly chosen nodes.
  3. Implement another button which instead generates the graph using preferential attachment. Specifally: first create num-nodes nodes, then create num-nodes * edge-ratio edges, each one created by picking two nodes chosen with a probability proportional to their degree (number of incident edges). For example, if there are 3 nodes, one with two edges and the other two with one edge each (the graph is a line) then the one with two edges gets chosen with probability 2 / 4 = 1/2, while each other two nodes is chosen with probability 1/4. The denominator is always the total number of edges and the numerator is the degree for that node.
  4. Implement a num-colors slider and randomly color the nodes using that many colors.
  5. Implement a layout button which calls one of the built-in NetLogo layout methods to make the graph look pretty.
  6. Implement a basic hill-climbing algorithm. On each tick every node looks at the colors of its neighbors and changes its color to one that does not conflict with any. If there is no such color then it will change to one that minimizes the number of constraint violations with its neighbors (min-conflict heuristic). If at any tick none of the nodes changes its color then we stop since a coloring has been found.
  7. Add a test button which performs a more extensive test. Specifically, for the given number of nodes and edge ratio, it will generate 100 graphs of random and preferential attachment types and run the hill climbing algorithm, plotting the number of time steps it took to find a solution in a histogram, one for random and one for preferential attachment. You will need to set an arbitrary large number for the 'does not stop' case.

Add your name and describe your results in the model description tab. Email me you .nlogo file by Wednesday September 21 @9am.

0
Your rating: None

Narrative graphs are a useful tool for charting out any narrative, but are especially useful for the development of game stories. This article overviews how this design tool can be used.

0
Your rating: None

Welcome to a series of blog articles about my experiment (read: stumbling around) of marrying data-oriented, memory-streamlined behavior trees with a second representation to ease creation and modification during development. I write it to document my findings and decisions and to ask for your invaluable feedback to build a BSD licensed BT toolkit that is truly useful.

Article Updates

  • March 10, 2011 – Added a reference to the second article in my behavior tree experiment blog series.
  • March 05, 2011 – Posted this article on my own blog bjoernknafla.com, too.
  • March 02, 2011 – Reader eric wrote a fantastic behavior tree feature analyzation in the comments section. Don’t miss it!
  • February 24, 2011 – added a reference section to the end of the article with additional references not found in the text.
  • February 24, 2011 – added a section to show the advantages of behavior trees over finite state machines based on a question by snake5.

Background

Behavior trees (BTs) are deployed by more and more game AI programmers to implement reactive decision making and control of the virtual creatures entrusted to them as  AiGameDev.com’s Alex Champandard notes in his retrospective for 2010 and outlook for 2011.

What is a behavior tree and how does it tick

My view and understanding of behavior trees has been fundamentally shaped by  Alex Champandard’s tutorials and online masterclasses on AiGameDev.com. Alex behavior tree definition is very elaborate and detailed. Other great online resources about behavior trees in games are Damian Isla’s Gamasutra article about the AI in Halo 2, Max Dyckhoff’s  presentation about Decision Making and Knowledge Representation in Halo 3, and Ricard Pillosu’s  Coordinating Agents with Behavior Trees slides about their use in Crysis.  Joost van Dongen blogs about the role behavior trees play in Swords and Soldiers here and here.

0
Your rating: None