For the entire weekend and a bit of Monday, I've been tweaking my di-graphs to represent flickr entries and in general, that has produced some amazing results. For example, I have 700+ people in a Contact of a Contact relationship and nearly 13,000 people in the next level of connectivity. But in particular, I was analyzing for cliques in the graph - a completely connected subgraph in which every node is connected to every other. For example, me, spo0nman, premshree and teemus was the first clique the system identified (*duh!*).
Initially, I had dug my trusty Sedgewick to lookup graph algorithms and quickly lost myself in boost::graph land. STL is not something I enjoy doing - this was getting more and more about my lack of const somewhere rather than real algorithms. And then I ran into NetworkX in python.
NetworkX is an amazingly library - very efficient and very well written. The library uses raw python structures as input and output and is not wrapped over with classes or anything else like that. The real reasons for this came up as I started using python properly rather than rewrite my C++ code in python syntax. When I was done, I was much more than impressed with the language than the library itself. Here are a few snippets of code which any C programmer should read twice :)
def fill_nodes(graph, id, contacts): nodes = [v.id for v in contacts] edges = [(id, v.id) for v in contacts] graph.add_nodes_from(nodes) graph.add_edges_from(edges) def color_node(graph): global cmap node_colours = map( lambda x: cmap[graph.degree(x) % cmap.length], graph.nodes())
Or one of the more difficult operations, deleting unwanted nodes from a graph.
# trim stray nodes def one_way_ride(graph): deleted_nodes = filter( lambda x: graph.degree(x) == 1, graph.nodes()) deleted_edges = filter( lambda x: graph.degree(x[0]) == 1 or graph.degree(x[1]) == 1, graph.edges()) graph.delete_edges_from(deleted_edges) graph.delete_nodes_from(deleted_nodes)
The sheer fluidity of the lambda forms are amazing and I'm getting a hang of this style of thinking. And because I was doing it in python, it was relatively easy to create a cache for the ws requests with cPickle. Anyway, after fetching all the data and all this computation, I managed to layout the graph and represent it interactively, in the process forgetting about clique analysis, but that's a whole different blog entry anyway.
--The worst cliques are those which consist of one man.
-- G. B. Shaw