Unofficial Erlang Planet

July 02, 2009

Mickaël Rémond

OneTeam 3.0 for iPhone

The new version 3.0 of OneTeam for iPhone has been released, with the push notification feature from Apple Push Notification Service (APNS).

When moving with your iPhone, as a pedestrian, bike rider, car driver or in public transportation, you might have to shutdown the OneTeam for iPhone application while staying connected to your XMPP server. It is convenient for battery saving purposes, or because you want to use another application (iPhone OS 3.0 still does not authorize background applications).

When an event occurs on your XMPP server, it is sent to the APNS (Apple Push Notification Service), which instantly relays it to your iPhone. A push notification is then poped up on your screen, with a sound played. The OneTeam badge (icon) is then changed, showing a new event has arrived, or the number of events waiting for you to read.

image

You simply have to tap the OneTeam icon to read your new event. The reconnection will be fast and seamless.

This feature is very handy, since you can use your iPhone the way you want (browse the web, use another application) while still being instantly notified as if you were using OneTeam on the background.

This push notification feature is also deployed on OneTeam.im XMPP server, and ejabberd has a component, IMpush, that can send push notifications to the APNS. This makes ejabberd and OneTeam the first XMPP client and server solution that supports the Apple push system.

OneTeam for iPhone has also received fixes and improvements: groupchat, file transfer, image sending... Here are a few screenshots:

An example of Groupchat:

image

See the Push configuration screen:

image

And the general OneTeam confguration:

image

A few comments on the experience after having used OneTeam for iPhone with push and groupchat:

  • "Battery life is not impacted by being always connected to my XMPP server on mobile. I have used all mobile XMPP client and this is the first time I can truely stay connected all the time. I feel this is really really the premise of the mobile revolution for mobile instant messaging and mobile XMPP."
  • "OneTeam for iPhone client has the type of feature you would expect from a good desktop XMPP client and not a mobile one. You can use groupchat, file transfer. This is really what I need professionally."

OneTeam for iPhone will be available on the Apple's App Store in a few days, after the usual moderation period at Apple.

IMpush is sold as soon as today on the IMstore to prepare ejabberd servers ahead of OneTeam 3.0 with push.

by Nicolas Vérité at July 02, 2009 11:59 AM

July 01, 2009

Caoyuan's Blog

Rats! Plugin for NetBeans#1: Syntax Highlighting

I've used Rats! parser generator heavily on Scala/Erlang plugin for NetBeans, but not write a plugin for Rats! itself.

So I spent my weekend days on a simple Rats! editor module, which implemented syntax highlighting. It's built on Scala.

Here's the snapshot:

nn

It will be available in couple of days.

by dcaoyuan at July 01, 2009 09:42 PM

Dukes of Erl

osmos

I just released the first version of osmos, a pure Erlang library that provides on-disk ordered set tables which allow thousands of updates per second with ACID properties.

It achieves that update rate using a rather different structure from a traditional B-tree based table (like the ones used by most RDBMSs or provided by DBM libraries like BDB or tokyocabinet): an incremental merge sort with user-defined merge semantics.

Motivation

Ordinarily, the rate of updates to an on-disk table is limited by the need to seek around and update an index in place. With a typical seek time on the order of 10 ms, this makes it challenging to scale past about 100 updates/s. Most strategies for going beyond that involve some kind of partitioning, either over multiple disks, multiple machines, or both.

However, a few key observations[1] point to a different strategy:
  1. The reason for updating an index on every write is the expectation that reads are much more frequent than writes, so that read efficiency is the dominating factor. But if writes are far more frequent than reads, you can use some kind of lazy updating to delay the work until absolutely necessary, and combine the work from multiple updates.
  2. An extreme example of a write-dominated, lazily updated database is a full-text inverted index for a search engine. To build one, you might typically read in billions of term-document pairs, sort them by term using some form of external merge sort, and then create a highly optimized index before ever handling a single query.
  3. A merge sort can operate continuously, by injecting new records in sorted batches, and then merging the batches as necessary to maintain a set of sorted files with exponentially increasing sizes. And crucially, this kind of incremental merge sort process can allow relatively efficient searches of the data while it's operating, by binary-searching each sorted file. (An example of this is the incremental indexing provided by Lucene.)
This gives you an ordered set table with a slight increase in the cost of a search (with N records, maybe an extra factor of log N). But the cost of a write is tiny, and mostly delayed: about log N future comparisons during merging, and log N future disk writes, but since all the disk writes are sequential, they will be buffered, and writing to the table requires no explicit seeking at all.[2]

User-defined merging

Things get even more interesting when you let the user control how records are merged. In the osmos model, there is at most one record for any given key; if two records with the same key are encountered during the merge sort, the user's merge function is called to merge the two records into a single record.

The merge function can be any function

Merge(Key, EarlierValue, LaterValue) -> MergedValue
that is associative, i.e.,

Merge(K, Merge(K, V1, V2), V3) =:=
Merge(K, V1, Merge(K, V2, V3))
for any key K and any consecutive sequence of values V1, V2, V3.

This allows a wide variety of semantics for writing to the table. For example:
  • If the merge function always returns the later value, then a write replaces any previous value, like an ordinary key-value store.
  • If the values are integers, and the merge function returns the sum of the two values, then writing to the table acts like transactionally incrementing a counter.
Similarly, you could use any associative function of two numbers; you could apply such a function to each element of a vector of numbers; or you could apply a different function to each element. For example, to keep a minimum, maximum, and average over some set of keys, you could use something like:

merge(_K, N1, N2)
when is_number(N1), is_number(N2) ->
{min(N1, N2), max(N1, N2), N1 + N2, 2};
merge(_K, N1, {Min2, Max2, Sum2, Count2})
when is_number(N1) ->
{min(N1, Min2), max(N1, Max2),
N1 + Sum2, 1 + Count2};
merge(_K, {Min1, Max1, Sum1, Count1}, N2)
when is_number(N2) ->
{min(Min1, N2), max(Max1, N2),
Sum1 + N2, Count1 + 1};
merge(_K, {Min1, Max1, Sum1, Count1},
{Min2, Max2, Sum2, Count2}) ->
{min(Min1, Min2), max(Max1, Max2),
Sum1 + Sum2, Count1 + Count2}.
This lets you write single numbers as values, but read back either {Min, Max, Sum, Count} tuples (if more than one number has been written for a key) or single numbers (if that was the only value written). To do this with an ordinary key-value table and multiple writers would require expensive transactions, but with osmos, operations like this are no more expensive than replacement, but still ACID.

As you can see, keeping statistics for reporting (when the reports are queried infrequently relative to data collection) is one of the killer applications for a merge sort table.

Among the possibilities for even wackier merge operations are:
  • Always return the earlier value. (What was the first value that occurred for this key?)
  • Take the union of two sets. (What are all the values that have occurred for this key?)
  • Take the intersection of two sets. (What values have always occurred for this key?)
  • Multiply two NxN matrices, e.g., to keep track of a series of rotations applied to a vector in RN.
  • Compose a series of arbitrary operations applied to a space of replaceable objects, e.g.:

    merge(_K, _, V) when ?is_value(V) ->
    V;
    merge(_K, V, Op) when ?is_value(V),
    ?is_operation(Op) ->
    do_operation(Op, V);
    merge(_K, V, Ops) when ?is_value(V),
    is_list(Ops) ->
    lists:foldl (fun (Op, V) ->
    do_operation(Op, V)
    end,
    V,
    Ops);
    merge(_K, Op1, Op2) when ?is_operation(Op1),
    ?is_operation(Op2) ->
    [Op1, Op2];
    merge(_K, Op, Ops) when ?is_operation(Op),
    is_list(Ops) ->
    [Op | Ops];
    merge(_K, Ops, Op) when is_list(Ops),
    ?is_operation(Op) ->
    Ops ++ [Op];
    merge(_K, Ops1, Ops2) when is_list(Ops1),
    is_list(Ops2) ->
    Ops1 ++ Ops2.
    The values could be employee records, and the operations could be things like, “change street address to X,” “increase salary by Y%.” Using this pattern, you can get extremely cheap transactional safety for any single-key operation, as long as your merge function implements it.

API

The basic API is quite simple:

{ok, Table} = osmos:open(Table, [{directory, D}, {format, F}])
to open a table named Table with the format F in the directory D;

ok = osmos:write(Table, Key, Value)
to write a record to the table;

case osmos:read(Table, Key) of
{ok, Value} -> ...;
not_found -> ...
end
to read the record for a key; and

ok = osmos:close(Table)
to close the table.

You can also iterate over a range of keys in chunks using osmos:select_range/5 and osmos:select_continue/3. The results from a select provide a consistent snapshot of the table, meaning that the results always reflect the contents of the table at the time of the original call to select_range. (In other words, any writes that happen between subsequent calls to select_continue won't affect the results.)

A table format is a record with the following fields:
  • block_size::integer(): block size of the table in bytes, controlling the size of disk reads (which are always whole blocks), and the fanout of the on-disk search trees.
  • key_format::#osmos_format{}: on-disk format for keys. (A pair of functions to convert some set of terms to binaries and back.)
  • key_less::(KeyA, KeyB) -> bool(): comparison function defining the order of keys in the table. Takes two native-format keys, and returns true if the first argument is less than the second argument.
  • value_format::#osmos_format{}: on-disk format for values.
  • merge::(Key, EarlierValue, LaterValue) -> MergedValue: the merge function described above.
  • short_circuit::(Key, Value) -> bool(): function which allows searches of the table to be terminated early (short-circuited) if it can be determined from a record that any earlier records with the same key are irrelevant.
  • delete::(Key, Value) -> bool(): function controlling when records are deleted from the table.
There are several pre-canned formats available from the function osmos_table_format:new/3, or you can build your own #osmos_table_format{} record as needed.

Performance

The file tests/osmos_benchmark_1.erl in the source distribution contains a benchmark that uses variable-length binaries as keys (some more frequent than others, with an average length of 15 bytes), and nonnegative integers as values, encoded in 64 bits, where merging takes the sum of the two values. One process writes random keys and values as fast as it can, while another process reads random keys with a 10 ms sleep between reads.

I ran the benchmark for 15 minutes on a 2.2 GHz dual-core MacBook, and got the following numbers:
  • 5028735 total records written, for an average of 5587 writes/second (including the time to compute random keys, etc.)
  • an average of 109.8 microseconds per write call, which would mean a theoretical maximum write rate of 9111 writes/second (for this table format, machine, etc.)
  • the median time per write call was 30 microseconds, and the 90th percentile was 54 microseconds, indicating that the vast majority of writes are extremely quick
  • an average of 1900 microseconds (1.9 ms) per read call
  • the median time per read call was 979 microseconds, and the 90th percentile was 2567 microseconds

I reran the benchmark for 2 hours on the same machine, and got the following numbers:
  • 17247676 total records written, for an average of 2396 writes/second
  • an average of 329.7 microseconds per write call, for a theoretical maximum write rate of 3033 writes/second
  • the median time per write call was 39 microseconds, and the 90th percentile was 88 microseconds
  • an average of 13076 microseconds (13 ms) per read call
  • the median time per read call was 6081 microseconds, and the 90th percentile was 34094 microseconds
The table had about 400 MB of data on the disk (in 7 files) at the end of the 2-hour run. This shows that read performance does start to suffer a bit as the amount of data on the disk grows, but writes remain very fast. (In fact, if there were no reads competing with writes, I wouldn't expect the write times to degrade even that much, since all that's happening synchronously during a write call is a buffered write to a journal file, and an insert into an in-memory tree.)

[1] I have to thank my good friend Dave Benson for introducing me to these ideas, and the generalized merge sort table. His excellent library GSK provides a very similar table API (GskTable) for people who want to write single-threaded servers in C. (I ripped off several of his design ideas, including prefix compression and the variable-length integer encoding.)
[2] Of course there may be implicit seeking due to the need to access multiple files at the same time. But for sequential access, the OS and disk hardware can mitigate this to a large degree as long as the number of files isn't too large.

by Michael Radford (noreply@blogger.com) at July 01, 2009 03:45 PM

Erlang Inside

erlang:lists/1

Welcome to the start of the erlang:lists series where we list some interesting happenings in the world of Erlang.

erlang:lists(CouchDBNaked) - Harish Mallipeddi, a performance engineer at Yahoo, has put together a great post on the internals of CouchDB, and how to even use some of the couch source to build your own B-Tree based mini-application.

erlang:lists(ErlangFactory) - Slides and videos are up from the 2009 London Erlang Factory conference. About half have videos and most have slides available. Not sure yet if they will all have videos or if there are reasons for not showing the films for some speakers.

Each week I’ll mention a few must-see articles, blog entries or videos. If you have something to bring to my attention - contact me at chad at inakanetworks com.

by Chad DePue at July 01, 2009 02:15 AM

June 30, 2009

Mickaël Rémond

ProcessOne presentations at Erlang Factory

There have been two presentations from ProcessOne at Erlang Factory. Here are the slides of the presentations.

The first presentation by me (Mickaël Rémond) was titled "OneTeam Media Server: Adding Video to Instant Messaging with Erlang".

The second presentation by Geoff Cant was title "Whitelabel Erlang" and give feedback on how Erlang can be used to write scalable hosted application for business.

by Mickaël Rémond at June 30, 2009 02:12 PM

Erlang Training and Consulting

30 June 2009: The Erlang Factory London is still the largest Erlang event!

The London Erlang Factory retained its place as the biggest gathering of Erlang talent in 2009, surpassing even the Palo Alto Factory! With more than 40 speakers and a series of 10-minute talks at the Erlounge, the London Factory brought together Erlangers from five continents. The Erlang Universities also proved popular and allowed delegates to combine their training with the conference for a great value-adding experience.

The presentation slides and videos of the talks can be found on the Erlang Factory website . You can still follow @erlangfactory on Twitter and join us on Facebook where we have put some photos from the event.

June 30, 2009 01:46 PM

LShift

PubSub-over-Webhooks with RabbitHub

RabbitHub is our implementation of PubSubHubBub, a straightforward pubsub layer on top of plain old HTTP POST — pubsub over Webhooks. It’s not well documented yet (understatement), but that will change.

It gives every AMQP exchange and queue hosted by a RabbitMQ broker a couple of URLs: one to use for delivering messages to the exchange or queue, and one to use to subscribe to messages forwarded on by the exchange or queue. You subscribe with a callback URL, so when messages arrive, RabbitHub POSTs them on to your callback. For example,

(The symmetrical …/subscribe/x/… and …/endpoint/q/… also exist.)

The PubSubHubBub protocol specifies some RESTful(ish) operations for establishing subscriptions between message sources (a.k.a “topics”) and message sinks. RabbitHub implements these operations as well as a few more for RESTfully creating and deleting exchanges and queues.

Combining RabbitHub with the AMQP protocol implemented by RabbitMQ itself and with the other adapters and gateways that form part of the RabbitMQ universe lets you send messages across different kinds of message networks — for example, our public RabbitMQ instance, dev.rabbitmq.com, has RabbitHub running as well as the standard AMQP adapter, the rabbitmq-xmpp plugin, and a bunch of our other experimental stuff, so you can do things like this:

RabbitHub example configuration

  • become XMPP friends with pshb@dev.rabbitmq.com (the XMPP adapter gives each exchange a JID of its own)

  • use PubSubHubBub to subscribe the sink http://dev.rabbitmq.com/rabbithub/endpoint/x/pshb to some PubSubHubBub source — perhaps one on the public Google PSHB instance. (Note how the given URL ends in “x/pshb”, meaning the “pshb” exchange — which lines up with the JID we just became XMPP friends with.)

  • wait for changes to be signalled by Google’s PSHB hub to RabbitHub

  • when they are, you get an XMPP IM from pshb@dev.rabbitmq.com with the Atom XML that the hub sent out as the body

RabbitHub is content-agnostic — you don’t have to send Atom around — so the fact that Atom appears is an artifact of what Google’s public PSHB instance is mailing out, rather than anything intrinsic in pubsub-over-webhooks.

We’ve also been experimenting with using http://www.reversehttp.net/ to run a PubSubHubBub endpoint in a webpage — see for instance http://www.reversehttp.net/demos/endpoint.html and its associated Javascript for a simple prototype of the idea. I’m playing with building a simple PSHB hub in Javascript using the same tools.

by tonyg at June 30, 2009 12:49 PM

lambder

Erlang’s Dynamism

This is more verbose answer to @bubbafat twitt-question. It started lkie this:

  • @bubbafat: Functions used by spawn to start a process must be exported. Why doesn’t erl compiler error when this is missed?  #erlang
  • @danielllo: @bubbafat ‘cos it is possible to define and load  new modules later on during runtime in #erlang
  • @bubbafat: @danielllo Thx. Do you mean redefining the module at runtime or that the func might resolve in a different module loaded later? #erlang
  • @danielllo: You have many ways of referencing #erlang module:function not known at compile time.
    1. you can load the precompiled module at later time from a path not being provided to compiler.
    2. you can use M:F(Args) function invocation in #erlang, any of M, F, Args being variables dynamically referencing module, function and argument list.
    3. you can construct erlang AST tree programmaticaly, compile it at runtime and load the resulting beam.
    4. you can acheive the p.3 result using a helper tools such as LFE, Smerl or  “Dynamic” module generation with compile-time macros
    5. … and I’m sure there is more ;)

by Daniello at June 30, 2009 09:59 AM

June 29, 2009

Nickelcode

Where to from here…

Been mulling over my next project after I finish a small Rails app for my friend. I’m not sure I’ll have the time in the near future but here are the ideas that are competing in my head for… well literally… mind share. Both use Erlang. Unraverl is the beginnings of a meta programming/parse transform library [...]

by John Bender at June 29, 2009 05:38 PM

LShift

ICFP Contest 2009

What is fast becoming a regular fixture in my diary is my entry with a few friends into the ICFP Programming Contest each year. This is a three day programming competition in which you can write in any language to solve the problems given. The competition is still in progress, though my team’s decided to stop — we’ve had enough fun for one year, and there’s only so far you can go with very little sleep and shockingly poor maths — but we’ve done much better than last year, learning from our previous mistakes…

Last year, the competition was to write a controller for a Martian Robot. You were given the simulator and the scenario files for it and had to write a controller to direct it to avoid boulders, rocks, and 7-fingered martians. Control was a little tricky, our path finding frequently never terminated fast enough and we never actually figured out how hard it was to direct the Robot until about an hour before the finish — and yes, we went the whole 72 hour distance then. We were totally wiped out, realised we’d solved the wrong question, or at least not spotted where the difficulty lay until it was far far too late and ended up doing very badly — somewhere below 250th position. In the course of the three days we wrote about 9000 lines of Haskell and almost every commit comment was either a single letter or a curse of some sort. A friend of ours wrote a very simple controller (in C++) and did very much better than us.

We learnt several things from that experience. One was to really think about the problem and work out where the difficulties lie before getting bogged down in a broken implementation. The other was the value of writing tools — namely visualisers so that you could see what’s going on. A picture really does paint a thousand words, and Haskell has some nice GTK+/Cairo bindings so it’s pretty easy to knock out a good visualiser. Last year, because communication with the simulator was via network sockets, we also used this to dump out data to be visualised. We had to run the simulator in real time (though some teams did work out how to hack it so it would run much much faster) and so it made sense to run the visualiser in real time too — it was pretty cool to watch what our rover was doing and we also dumped out our graph for route finding which we overlaid too. However, because it was so timing sensitive, we found that our rover behaved differently if it was running with the visualiser or not — the extra cost of dumping out debug information altered its behaviour!

This year, the task involved moving satellites between different orbits. We were given only the compiled scenarios, and a spec of the VM which they were to run on. So step one was implementing the VM. This was pretty straight forward, though we immediately spotted we want this to run as fast as possible because a valid solution could be up to 3,000,000 “seconds” long and there’s no way we’re sitting around for that! — 3,000,000 seconds is 833 hours which is about 11.5 times longer than the whole competition was due to last! So I pretty much treated Haskell as C, got out the bit masks, and had the whole thing running in my CPU cache (ok, I have a 12MB L3 cache, so it’s not as impressive as it sounds!). I was quite pleased — with a bit of maths I worked out I was spending about 250 CPU cycles per VM instruction — about 8 million instructions per second. We were getting through the 3,000,000 “seconds” in under 2 minutes and this was clearly very usable. We were also dumping out to a file all sorts of debug information which we would later parse and build a visualiser for. Sadly, that same friend I mentioned earlier also wrote a VM in C++ and got to about 35 CPU cycles per VM instruction. Still, within a factor of 10 is quite good for Haskell — both of us used pretty much every trick we knew to make it go fast. I guess there’s still work to be done for the GHC developers!

Next task was writing the visualiser. Again, Haskell, GTK+/Cairo, a nice slider to allow us to step through each frame, plotting the output, and a textual dump of the information we capture in each frame. Also, quick navigation to the next “interesting” event — eg when we fire thrusters — searching for events like these amongst 3 million frames is a little frustrating, so quickly being able to jump computationally was a good extra feature. It suddenly became very cool to rapidly step through every 100 frames and get a primitive video of our satellite moving about, and gravity playing havoc with its every move!

And now to the problem itself. There’s a thing called a Hohmann Transfer which allows you to change between different orbits using two burns. You use one burn to push you out of your current orbit and to go into an elliptical orbit, and then 180° later you do another burn which puts you back into your target circular orbit. Task one was simply doing this maths, getting it right, and going from one circular orbit to another. Sometimes the orbit would be further out, sometimes further in. This was reasonably straight forward.

The second task was not only changing orbits, but also meeting up with another satellite in the target orbit. You had to stay within 1km of the target satellite for 900 seconds to pass. So, between two steps, you could use your position and your target’s position to calculate enough information to solve the necessary equations. You would know how long you needed to sleep for (i.e. carry on in the current orbit), how much you needed to burn, how long the transfer between orbits would take, and what the final burn would be.

In this task, there were four scenarios. Three of them we got quite quickly, but the fourth eluded us for a long time (in fact, although we got a solution, we don’t have a single piece of code that can do all four tasks, so it doesn’t really count). The problem with the fourth scenario is that the target satellite is a VERY long way out. Our equations are for continuous maths, but the simulator is discrete. So when we calculate exactly how long to sleep for (which is the number of VM steps, or “seconds” to sleep for), that value has a fraction, but you can’t sleep for fractions. So this means that you either burn a bit early, or a bit late, or you try to spread the burn over two steps. Either way, you burn at the wrong time, and so you arrive in the wrong place (exactly the same rounding issue happens with the transfer time (i.e. the amount of time to spend in the elliptical orbit), and so you either try to come out of the elliptical orbit early or late). For close to earth, as three of the four problems were, this inaccuracy doesn’t matter too much — the perimeter of the circles you’re landing on is small enough that if you’re out by a few fractions of degrees, you’re still in real terms close enough to the target satellite to be within 1km and score points. When you’re going a long way from the earth, these few fractions of degrees correspond to a long distance — the closest I could reliably get was just under 10km away from the target. Not good enough.

We tried going too far out and ahead of the target, and then trying again in the hope that it’d be easier to be more accurate if we were closer to the target to start with. We tried going just not far enough and behind the target and trying again, all to no avail. The problem there is that when you’re a long way out, each step of the simulator counts for several thousand miles, and if your window of when you need to fire your thruster to hit the target is quite small, you can frequently miss it between two steps. We actually discovered that it was much much better to try going out, if you miss it, come all the way back in to, say an orbit of 1.025 * EarthRadius and then going back out — this was faster and gave us many more attempts, and as we were orbiting the Earth so quickly, we could afford to wait until the stars were in alignment, a particular simulator step showed us that we were really really close to the ideal time to fire, and to try again. Sadly, again due to the differences between the continuous maths we were doing and the discrete maths of the simulator, after a few outs and ins, we noticed we were very much not in a circular orbit around Earth when we thought we were. Rounding errors had rather crept up on us.

So that is where we got stuck. We spent many hours arguing about maths and angles. About the only maths we didn’t argue about was vectors, but then we’d previously (re-)written (several dozen) vector maths libraries as a result of games we’ve written (or attempted) so those libraries are reasonably solid. The visualiser worked really really well — our friend with the uber-fast C++ VM didn’t have one, and it showed, when he was staring at columns of numbers not understanding what his satellite was doing. Really, we needed to throw all our maths away, and implement an iterative solver — effectively cloning the maths the simulator was doing. Only this way could we be accurate enough. But neither of us have maths strong enough to solve those equations, or at least enough clue to work out where to start. Nevertheless, at one point, we were in 52nd position, which is a stunning improvement from last year, though we have dropped back a bit now as other teams are still working on it. It finishes at 7pm BST today, though we’re not going to do any further work on it.

For those interested, the 3rd part of the question was the same as the 2nd part — i.e. meet up with a different satellite, except that now the target satellite can be in an elliptical orbit, and you can start in an elliptical orbit too. The fourth part was to add in an element of AI so that you can visit several randomly positioned satellites, stopping at a special refueling satellite if necessary to refill, and doing it as fast as possible. Oh yes, and the Moon makes an appearance in the fourth part, adding much more fun to your equations (figure-of-eight orbits around the Earth and Moon anyone?!).

We took it at a much more relaxed pace this year, I even had to go out to play in a concert at one point, though it was mercifully brief, but had more fun than last year, did better, and didn’t make anywhere near as spectacular mistakes!

by matthew at June 29, 2009 10:45 AM

June 24, 2009

Damien Katz

StackOverflow Podcast

Yesterday I did a StackOverflow podcast with Joel Spolsky and Jeff Atwood. We talked about CouchDB and Erlang, among other things: StackOverflow Episode 59

by Damien Katz at June 24, 2009 07:27 PM

Erlang Training and Consulting

11 June 2009: Get a 35% discount on the latest Erlang book

Francesco Cesarini and Simon Thompson will be signing their new book Erlang Programming at the Erlang Factory. Buy it on a day and get 35% discount off the cover price.

June 24, 2009 05:01 PM

08 June 2009: Open Source Light Weight HTTP 1.1 Client

After recent experiences and discussions with the Erlang community, ETC have decided to develop and release an HTTP/1.1 lightweight client. You can find more information in the Erlang Open Source section of our website or in the contributions section on Trapexit.

June 24, 2009 05:01 PM

13 May 2009: Erlang Training and Consulting at OSCON 2009!

Erlang Training and Consulting has been selected to represent Erlang at O’Reilly’s Open Source Convention in San Jose, California July 20 - 24, 2009. We will jumpstart the week with a tutorial on Practical Erlang Programming and follow up with Erlang for Five Nines, a non technical introduction to Erlang. If you are planning on attending, use discount code OS09PGM to receive a 15% discount. See you there!

June 24, 2009 05:01 PM

29 March 2009: Sponsoring the 2009 Powered By Erlang Eurobot Team!

Erlang Training & Consulting is co-sponsoring the development of a “Powered by Erlang” IANO Robot. This Intelligent Autonomous Nasty Object will compete against other international solutions in the in the 2009 Eurobot competition building (and destroying) a temple! It is developed by the Computer and Telecommunication Engineering department of the University of Catania (Italy). For more information click here...

June 24, 2009 05:01 PM

03 March 2009: Meet us at QCON London 2009 !

Erlang Training and Consulting’s Francesco Cesarini and Ulf Wiger will be giving a presentation on Erlang and Multicore and one on Concurrent Erlang Architectures at QCON in London next week. We will also be track hosts for the Functional and Concurrent Programming Languages Applied. See you there!

June 24, 2009 05:01 PM

16 Febuary 2009: Welcoming Ulf Wiger as Erlang Training and Consulting's New CTO

Ulf has used Erlang since 1992, and made a name for himself as the Chief Designer of Ericsson's AXD 301 - for many years the main showcase of Erlang's strengths. With his 20-year track record of building high-availability systems and strong ties to the Erlang community, Ulf will be a powerful addition to our management team.

June 24, 2009 05:01 PM

04 Febuary 2009: Announcing the 2009 SF Bay Area Erlang Factory!

The Erlang Factory comes to Palo Alto in the San Francisco Bay Area April 27th - May 1st! This promises to be the largest gathering of Erlang expertise since last year's eXchange in London. Confirmed speakers include Robert Virding, Ulf Wiger, Richard Carlsson, Kevin Smith, John Hughes, Ezra Zygmuntowicz, Mickael Remond and many many more. Tentative tracks include Tools and Gadgets, Erlang and TDD, Erlang and IM, Relax with CouchDB and Cool Applications. Running together with the Factory is the Erlang University - 3-day courses on Erlang, OTP Design Patterns, Quick Check and CouchDB, allowing you to combine a training course, talks and tutorials in the same week. Come and meet Erlang inventors and experts who have been using the language long before it was released as open source, network with committers of the open source applications or debate and discuss the latest features and libraries. For more information on the Erlang Factory, visit our Erlang Factory site.

June 24, 2009 05:01 PM

01 Febuary 2009: Erlang in South America!

Erlang Training & Consulting gets its first contract in South America. This prestigious consulting job brings to six the number of continents on which it operates. Previous continents include Africa, Australia, Asia, North America and Europe. Is there anyone in Antarctica we can help?

June 24, 2009 05:01 PM

11 Novmber 2008: Announcing our Training Schedule for 2009!

As a very busy and successful year with lots of scheduled courses draws to a close, we can now confirm our Training Schedule for 2009.

Next year we will be offering courses in Dubai and Singapore in addition to our usual ones in UK, Sweden, Poland, USA and South Africa. For details, please see our Training Schedule.

June 24, 2009 05:01 PM

03 Novmber 2008: Releasing the Erlang Web 1.1 Platform as Open Source!

The Erlang Web is an open source framework for the rapid deployment of web based interfaces. By separating the HTML generation, glue and logic while retaining it in the same memory space, we provide a framework which gives the developer better control of content management where reusability is the key. Erlang Training and Consulting has been using the Erlang Web in commercial applications for three years, and in order to increase the user base and available generic components, has decided to release it as open source. For more information and to download the latest version of the code, read documentation and view examples, presentaitons and tutorials, visit the Erlang Web site. You can also meet us at the Erlang User Conference, where we will be presenting the Erlang Web.

June 24, 2009 05:01 PM

Jan Lehnardt

EU Summer Tour

After the US Spring Tour in April this year, I’m about to embark on the EU Summer Tour.

I’ll be visiting London, Amsterdam, Zurich and the Gran Canaria. Here’s when, how and why:

June 22nd–26th: CouchDB University & Factory, London

The CouchDB University is a a three day training course where J Chris and I teach a select group of students everything about CouchDB. With little prior knowledge, we’ll leave you with being able to build amazing CouchDB applications at small and large scale as well as extend CouchDB itself.

The CouchDB Factory is a track at the Erlang Factory running all day Friday.


June 29th–30th: Kings of Code, Amsterdam

Kings of Code looks like it is going to be a kick-ass web developer conference featuring some of my favourite web people: Geoffrey Grosenbach Joe Stump & Francisco Tolmasky. I’m fairly confident that the other speakers will be among my favourites after Kings of Code :)

J Chris will be talking about CouchDB.

It’s still in discussion, but I might talk about CouchDB and Erlang for web developers on one of the side events.


July 1st–2nd: ICOODB, Zurich

I’ll be taking the night train from Amsterdam zu Zurich to give a three hour tutorial as well as a 60 minute presentation on CouchDB at the International Conference on Object Databases. CouchDB is strictly not an object oriented database, but it stores objects and is of interest to the research community that meets in Zurich.

Prof. Stefan Edlich invited me to speak at ICOODB and I’m very happy I can make it.


July 1st–7th: GUADEC, Gran Canaria

Canonical, the kind folks behind the Ubuntu Linux distribution are pushing CouchDB to become a centerpiece of the Ubuntu desktop data synchronization infrastructure. Merrily sync your contacts, calendar data between your machines, an online backup service and share select data with your peers. And yeah UbuntuOne is also related :)

Canonical is flying me out to attend the Linux Desktop Summit to talk to desktop application developers and show them how cool CouchDB is and where it is useful for them.

Also, Gran Canria, I couldn’t say no. Thank you Canonical!


As much as I am excited about the travels and meeting all you out there, I’ll be missing three weeks in my favourite city, Berlin and it makes me a little sad.

by Jan (jan@apache.org) at June 24, 2009 06:19 AM

June 22, 2009

Debasish Ghosh

scouchdb Views now interoperable with Scala Objects

In one of the mail exchanges that I had with Dick Wall before the scouchdb demonstration at JavaOne ScriptBowl, Dick asked me the following ..

"Can I return an actual car object instead of a string description? It would be killer if I can actually show some real car sale item objects coming back from the database instead of the string description."

Yes, Dick, you can, now. scouchdb now offers APIs for returning Scala objects directly from couchdb views. Here's an example with Dick's CarSaleItem object model ..

// CarSaleItem class
@BeanInfo
case class CarSaleItem(make : String, model : String, 
  price : BigDecimal, condition : String, color : String) {

  def this(make : String, model : String, 
    price : Int, condition : String, color : String) =
    this(make, model, BigDecimal.int2bigDecimal(price), condition, color)

  private [db] def this() = this(null, null, 0, null, null)

  override def toString = "A " + condition + " " + color + " " + 
    make + " " + model + " for $" + price
}


The following map function returns the car make as the key and the car price as the value ..

// map function
val redCarsPrice =
  """(doc: dispatch.json.JsValue) => {
        val (id, rev, car) = couch.json.JsBean.toBean(doc, 
          classOf[couch.db.CarSaleItem]);
        if (car.color.contains("Red")) List(List(car.make, car.price)) else Nil
  }"""


This is exciting. The following map function returns the car make as the key and the car object as the value ..

// map function
val redCars =
  """(doc: dispatch.json.JsValue) => {
        val (id, rev, car) = couch.json.JsBean.toBean(doc, 
          classOf[couch.db.CarSaleItem]);
        if (car.color.contains("Red")) List(List(car.make, car)) else Nil
  }"""


And now some regular view setup code that registers the views in the CouchDB design document.

// view definitions
val redCarsView = new View(redCars, null)
val redCarsPriceView = new View(redCarsPrice, null)

// handling design document stuff
val cv = DesignDocument("car_views", null, Map[String, View]())
cv.language = "scala"

val rcv = 
  DesignDocument(cv._id, null, 
    Map("red_cars" -> redCarsView, "red_cars_price" -> redCarsPriceView))
rcv.language = "scala"
couch(Doc(carDb, rcv._id) add rcv)


The following query returns JSON corresponding to the car objects being returned from the view ..

val ls1 = couch(carDb view(
  Views builder("car_views/red_cars") build))


On the client side, we can do a simple map over the collection that converts the returned collection into a collection of the specific class objects .. Here we have a collection of CarSaleItem objects ..

import dispatch.json.Js._;
val objs =
  ls1.map { car =>
    val x = Symbol("value") ? obj
    val x(x_) = car
    JsBean.toBean(x_, classOf[CarSaleItem])._3
  }
objs.size should equal(3)
objs.map(_.make).sort((e1, e2) => (e1 compareTo e2) < 0) 
  should equal(List("BMW", "Geo", "Honda"))


But it gets better than this .. we can now have direct Scala objects being fetched from the view query directly through scouchdb API ..

// ls1 is now a list of CarSaleItem objects
val ls1 = couch(carDb view(
  Views builder("car_views/red_cars") build, classOf[CarSaleItem]))
ls1.map(_.make).sort((e1, e2) => (e1 compareTo e2) < 0) 
  should equal(List("BMW", "Geo", "Honda"))


Note the class being passed as an additional parameter in the view API. Similar stuff is also being supported for views having reduce functions. This makes scouchdb more seamless for interoperability between JSON storage layer and object based application layer.

Have a look at the project home page and the associated test case for details ..

by Debasish (ghosh.debasish@gmail.com) at June 22, 2009 12:20 PM

June 21, 2009

Caoyuan's Blog

Erlang Plugin Version 1 for NetBeans 6.7 Released

I'm pleased to announce Erlang plugin (ErlyBird) version 1 for NetBeans 6.7 is released.

NetBeans 6.7 RC3 or above is a requirement.

What's new:

  • It's rewritten in Scala instead of Java
  • More reliable instant rename
  • Display extracted document information from source comment when doing auto-completion.

To download, please go to: https://sourceforge.net/project/showfiles.php?group_id=192439&package_id=226387

To install:

  • Open NetBeans, go to "Tools" -> "Plugins", click on "Downloaded" tab title, click on "Add Plugins..." button, choose the directory where the Erlang plugin are unzipped, select all listed *.nbm files, following the instructions.
  • Make sure your Erlang bin path is under OS environment PATH, you can also check/set your OTP path: From [Tools]->[Erlang Platform], fill in the full path of your 'erl.exe' or 'erl' file in "Interpreter", for instance: "C:/erl/bin/erl.exe". Or open the "Brows" dialog to locate the erlang installation.
  • When you open/create an Erlang project first time, the OTP libs will be indexed. Take a coffee and wait, the indexing time varies from 10 to 30 minutes depending on your computer.
Feedback and bug reports are welcome.

by dcaoyuan at June 21, 2009 02:04 AM

June 19, 2009

Jan Lehnardt

Caveats of Evaluating Databases

This is part two in a small series about measuring software performance. There’s a lot of common sense covered, but I feel it necessary to shed some light.

If you haven’t, check out part one.


Say you want to find out what’s behind the buzz of all these new #nosql databases. There’s a large number to choose from today. All options come in varying degrees of maturity and characteristics so it’d be nice to know what solves your problem best. A non-exhaustive list of these databases or storage systems include Memcache[DB], Tokyo Cabinet / Tyrant, Project Voldemort, Scalaris, Dynamite, Redis, Persevere, MongoDB, Solr or my favourite CouchDB. And these are just some of the open source ones.

This article is not a comprehensive comparison of any of the mentioned systems. Instead it tries to give you an idea about what to look for when evaluating a storage system or how to take into perspective evaluations and benchmarks others have done.

We’ll look at some of the technical aspects of data storage systems: Applying common sense when reading benchmarks; b-trees and hashing; speed vs. concurrency; networked systems and their problems; low level data storage (disks’n stuff); and data reliability on single-nodes and multi-node systems.

There are a lot of other reasons to decide for or against a project based on a lot of non-technical criteria, but things like commercial support or a healthy open source community are not part of this article.

Astounding Numbers

From time to time you see some crazy numbers posted to the reddits of the internets that claim fantastic performance.

The (imaginary) SuperfastDB can store 450,000 items per second!.

Wow.

No word on where the items are stored (in memory? on a harddrive? Spindles? Solid State?), what an item is exactly and how big it is, the rest of the hardware this was run on and how to reproduce it.

But boy, 450,000 a second!

My shoes can do 650,000 a second, but you’ve got to figure out what.

Context is as important as reproducibility. The last article here established that finding out that my system and your system come up with different numbers is not much of a help. Any sort of serious test must come with a set of scripts or programs and comprehensive instructions on how the tests were run.


Everything “cool” in computer science has been around for 25+ years. Actual innovation is rare. Advancements in hardware and new combinations of existing solutions make for new stuff coming out each day (that’s a good thing), but the fundamental rules are the same for all. We’re all running von Neumann machines, quicksort is still pretty quick and hashes and b-trees rule the storage world.

Let’s recap.

Hashes & Trees

Hashing revolves around the idea of O(1) lookups. Allocate a number of buckets, create a function that gives you a number of a bucket for any data item you might want to store, make sure no two data items hit the same bucket (or work around that). Runtime characteristics include that you only need to ask your function where to look for or store your data and the allocation of your set of buckets: If you need to store more items than you have buckets, some more work is required which gives you O(N) operations that you can’t ignore in practice.

D5563B63-7B48-4280-A31F-EDB37DB78416.jpg

The other elephant in the room are b-trees. The fundamental idea here is to get to your data in a minimal number of steps traversing a tree because making a step is expensive, but reading your data is very fast comparatively. Steps are expensive because they translate to a head seek (that is the time your spinning hard drive needs to position the reading arm to find the spot to read your data from), but reading from a harddrive once the reading head is in place is fast.

6720EE64-4DFC-4298-B3BA-0145746C6523.jpg

There are a bunch of more interesting lookup structure like R-Trees for spacial queries, but they are mostly used for secondary indexes on top a regular data set that lives in a hash or b-tree.

Concurrency vs. Speed

Concurrency is hard. The devil lies in the details and when briefly looking at things, the details are often overlooked. Suites the devil.

Creating storage systems that assume only one access occurs at a time is relatively easy. If resources are shared concurrently, things become tricky. The two larger schools of thought (and practice) are locking and no-locking (heh).

Locking means that the database has to maintain information for everybody who wants to write to a part of the database, and what part it is.

No locking, or optimistic locking or MVCC moves that burden to the person who is trying to write to the database. She must prove that she won’t be overwriting any existing data.

The trade-offs here are a leaner request handing on the server that works well with remote & concurrent clients at the expense of more complexity on the client (the person who wants to store something in our database).

Hybrid approaches are possible too: While MVCC is used internally, the database’s clients can rely on database-side locking (e.g. PostgreSQL or InnoDB).

Networks

Just a quick note: We already talk about client and server here. There is a strong case for embedded databases like SQLite that don’t expose a concurrent user model to the outside. The program that needs an embedded database just includes it.

Another approach to using databases is having a dedicated computer running a database system and sharing it over the network with any number of clients using this database server. They can often be “a bunch of servers” or a cluster. More on that later.

A separate database server (networked or not) will need to spend some time to deal with connections, network failures, unspecified client behaviour and so on. The upside is a piece of infrastructure that can be maintained separately. An embedded database will thus be faster but probably won’t solve all of your problems and it will always be tied to your application.

fsync(): Reliability vs. Speed

When people tell me “SuperfastDB does 450,000 a second!” I ask “How many fsync()s is that?”. Let me explain:

A database system uses operating system services to use any hardware. The operating systems exposes a harddrive through a filesystem. The database systems talks to the filesystem and asks it to store or retrieve data in its behalf. The filesystem then goes ahead and tries to satisfy the database’s requests.

(I’ll not talk about databases that can use raw block devices to store data. They exist but they are not as common as those who use the filsystem.)

The filesystem also tries to be clever – for good reasons. When the database requests a piece of data, the filesystem will not only find that piece and return it, it will also store it in a cache to avoid having to actually talk to the harddrive the next time this piece of data gets requested. When the data changes, the filesystem either removes it from the cache or updates it with the harddrive. It might even go further and only store the new data that comes in with a write request into the cache and rely on a periodic task to write all of the cache back to the drive. Writing a bunch of of pieces at once is more efficient than storing each one on its own.

More efficient equals to faster and faster is good, right? Well, it depends: If all goes well, this approach is a nice one. But you know computers, things will not go well 100% of the time. The failure scenarios are endless, but they boil down to the question: “What happens when your machine dies and you have data that has only been written to memory?” — The answer isn’t too hard: That data is lost. If there is a delay between a write request finishing and data being written (or “flushed”) to disk any data that has been “written” during the delay period is subject to lost.

There are cases where this is not a problem; in other cases it is. A developer should have the chance to decide. (Note that even your hardware could be lying to you about having stored data, but I’ll punt on this one, get proper hardware).

So, flushing to disk needs to happen before you can rest assured your data has been stored. Your operating system has an API call that forces the filesystem to write its cache to disk. It is called fsync() (on UNIX systems) and it is an expensive operation. You can only do so many fsync()s in a second and it is not a great many.

The 450,000 items were most likely just written to memory and not to disk.

Space & In-Place

When writing files to disk (at the end of the day, your data ends up in one file or another on the filesystem) that represents what lives in a database, there are multiple options to handle updates.

An update is a change to your data item, for example, a new phone number. The intuitive way to handle this is to go and find the old phone number in the file, and overwrite it with the new number. Easy.

There are several problems with this approach: What to do if the new phone number is longer than the old one (say you added an international calling prefix)? The new number needs to be written to a different place and the change in location must be recorded. Not too big of an issue.

Back to failure scenarios: Again, the reasons can be manifold, but what happens when we’ve (over-)written the first 4 digits of the old with the new number and then the server dies, power goes away or the database server crashes? The next time you want to read the phone number you get a mix of the old and the new one (if you are lucky) and you don’t exactly know that this is the case and which parts are missing. Your database file is inconsistent and you need to run a integrity check to find missing bits and correct half-written bytes. In the worst case that means scanning your entire database file a few times before you resolved all inconstancies. If you have a lot of data, that can take days.

To solve this, you always write the new phone number to a new place in the database file and only when it has been fsync()ed to disk, you update the location of the phone number (and then flush that update to disk as well). You will never end up in a scenario where your database file can end up an inconsistent state and after a crash you are back online without an integrity check.

The trade-off for consistency is write-speed (remember fsync()s are expensive) for consistency-check-speed after a failure.

A nice bonus is that if the “new place in the database” is the end of the file, you keep your disk-drive head busy with writing data to disk instead of seeking all over the place (remember: seeks are expensive).

Distribution, Sharding & Resharding

So far, we’ve been looking at scenarios that involve a single database. We learned a great deal (I hope), but in reality we often deal with more than one database. The simplest reason to have two databases is for redundancy. Failures can bring down your database temporarily or even permanently. If it is a temporary issue, waiting a bit (or a bit longer) to get up and running again might be an option, but often, an application or service should be available at all times. A fatal failure where a database server is lost beyond repair, your data is gone if you haven’t stored it in a second place.

“I’ll just make two copies, easy!”. Yup easy, until you look at the details (that damn devil again!).

It’s all about failures again. Consider a single read request. A client connects to a server and asks for a data item. The server looks it up and returns the data to the client. All is well. At any point things can go wrong. The network connection can drop (or slow down so much that client or server assume it dropped), the client can disappear (because of a network failure or crash) as can the server. Clients, servers and the protocols they speak need to be built around the assumption that any of these things (and many more) can go wrong. If any parts is not designed to handle error cases, your system will do funny things, but it won’t reliably store and manage your data.

Add complexity: With each write target (store in two places) the possibility of error and the need for proper error handling grows exponentially. When evaluating a distributed storage system, looking at how errors are handled is vital.


Another reason to distribute data among multiple servers is capacity. The three metrics of interest here are read requests, write requests and data. If you have more requests or data than a single machine can handle, you need to move to multiple machines. Each metric calls for different strategies, but they often go along with each other. The need for fault tolerance that I discussed above needs to be considered alongside.

Growing read capacity is relatively easy once you covered the base case where the source for reading data might not be the same as the the target for writing data and that there can be a mismatch (cf. eventual consistency).

Distributing writes and data works by designating two machines with 50% of the operations. A clever intermediate, a proxy server for example, decides which request goes where and all is well, we can store twice as much and we can store at twice the speed. When we need to grow bigger yet, we add another server and tell the proxy server to distribute the load equally among them. Adding a proxy for distribution introduces a single point of failure and you don’t want these; there’s added complexity with this approach.

resharding.png

The diagram shows that there is another step needed that wasn’t included in the above description. The new “node” needs to have a copy of all data items that are assigned to him and are currently living on the two existing nodes. The process of moving data items to new nodes is called resharding and needs to happen every time a new node is added.

Resharding can be an expensive operation if you have a lot of data. Techniques like consistent hashing help with minimising the amount of items that need to move. If you are looking at a sharding database, you want to understand how the sharding is performed and if you like the trade-offs.

CAP Theorem

The CAP Theorem states that out of consistency, availability and partition tolerance, a system can choose to support two at any given moment, but never three.

cap.png

Consistency guarantees that all clients that talk to cluster of nodes will always get to read the same data. Write operations are atomic on all nodes.

Availability guarantees that in any (reasonable) failure scenario, clients are still able to access their data.

Partition tolerance guarantees that when nodes in the cluster lose their network connection and two or more completely separated sub-clusters emerge, the system will still be able to store and retrieve data.

Please Talk! (To Developers)

If you are aiming for a comparative benchmark of two or more systems, you should run your procedure by they authors. I found developers are happy to help out with benchmarks by clearing up misconceptions or sharing tricks to speed things up (which you can choose to ignore, if you are looking for out-of-the box comparison, but this is rarely useful).

by Jan (jan@apache.org) at June 19, 2009 11:16 PM

Hypothetical Labs

Win32 Linked-In Drivers and A Project Idea

I’ve been living in linked-in driver land for the past few weeks. I guess that’s what I get for publishing basic_erl_driver. It’s been interesting work so I certainly haven’t minded doing it.

Well, I didn’t mind it until I had to build one of my drivers on Windows. It’s been a long time since I’ve done much Win32 development but I used to know my way around Visual C++ pretty well. “I used to do this shit for a living. How hard could this be?”, I thought to myself.

Three words: famous last words.

Three more words: C++, linker, XML.

Two final words: sheer frustration.

After wrestling with Visual Studio Express C++ Edition 9.0 (or whatever crazy name MS uses) for a day and a half I finally have a working driver. Here’s the nuggets o’ knowledge I learned along the way. These notes assume you’re porting a working driver from OS X/Linux/BSD to Windows although most of them are applicable if you’re starting from scratch on Windows, too. They also assume you’re building code from inside of Visual Studio because, frankly, I don’t care enough about Windows to figure out nmake, too.

  • Start with a blank Win32 DLL project. Don’t let Visual Studio create stdafx and friends. Down that path lies madness. You’re much better off to create a blank project and import your code.
  • Add the erts-<version_number>/include directory to your project’s include path. You’ll need this so the compiler can resolve erl_driver.h.
  • Don’t forget to include string.h before erl_driver.h. erl_driver.h uses memcpy() without including string.h. Including string.h first will fix a compiler warning and a possible runtime crash.
  • Very Important: Right-click your project and navigate to the Linker -> Manifest File option. Enter this magical text into the Additional Manifest Dependencies field:
    type='win32' name='Microsoft.VC90.CRT' version='9.0.21022.8' processArchitecture='x86' publicKeyToken='1fc8b3b9a1e183b'
    

    I believe this tells the compiler to link the DLL with the Visual C++ 9.0 runtime. This is the equivalent of the C stdlib on other platforms, I think. Erlang will be unable to load your driver if you skip this step.


  • Observation: Who in their right fucking mind thought it was a good idea to add XML descriptors to the link step of building a C library?! I take back every bad thing I’ve ever said about Java’s use of XML. SOAP, with all of its angle brackety warts, almost looks sane in comparison.

  • With the previous step accomplished you’ll need to distribute the VC++ runtime with your driver. Luckily, MS provides the code in a redistributable form inside the Microsoft Visual Studio 9.0\VC\redist\x86 directory structure. Grab all of the files in there and plop them into the same directory as your driver’s DLL file.
  • There are probably better ways to accomplish the same goal given my dated platform knowledge. If you know of a better way to do this, please drop a comment and set me straight!

    One of the drivers I’ve been working on is to allow SpiderMonkey to interface with Erlang. I’ve asked and am waiting for permission from my client to open source the code since I think it’d be generally useful. If that doesn’t pan out I’m strongly considering doing another driver — open source from the beginning — to interface with V8 or possibly WebKit’s engine. I’ve got some ideas on how JSON could smooth the interop between the two languages. Anyone up for either working on or sponsoring work on this?

    by kevin at June 19, 2009 02:29 AM

    June 17, 2009

    Dukes of Erl

    Keeping it simple with flatula

    Paul has blogged about overcoming mnesia performance issues in the past, but I don't think we've talked much about the ultimate strategy -- keeping data out of mnesia altogether.

    When we first started serving ads, we stored information about every single ad impression in a huge mnesia database, for retrieval on click, and for building behavioral profiles. Almost needless to say, this didn't scale very far. We spent many a day last summer delving into mnesia internals, fixing corrupted table fragments after node crashes, bemoaning how long it took new nodes to join the schema under heavy load, and so on.

    One of the simplest and most effective changes that got us out of this mess was not to store any per-impression data in mnesia at all -- instead, we started logging the data to flat files on disk, and storing a small pointer to the data in a cookie so we could read it back the next time we saw the user. Hardly a revolutionary solution . . . it's well-known that disk seeking is the enemy of performance. The hardest part was coming to realizations like, "Hmm, I guess we don't really care if a node goes down and we lose part of that data!"

    We've open-sourced one of the main components that enabled this strategy: flatula, an Erlang application that manages write-once "tables" that are really just collections of flat files. It looks a bit like dets, except that it doesn't support deletions, updates, or iteration, and you can't make up the keys. But when you don't need those things, it's hard to imagine a more efficient way to store data.

    If you'd like to learn more, there's a brief tutorial on the Google Code site.

    by Michael Radford (noreply@blogger.com) at June 17, 2009 04:58 PM

    lambder

    How to redirect in webmachine

    Recently I was looking for an example on how to do redirect in webmachine. Unfortunately I haven’t found one. So I started figuring it out by myself. After try and error using brilliant wmtrace_resource It turned out to be trivial ;). Here is the example:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    
    %% @doc Example of redirect webmachine_resource.
     
    -module(redirect_resource).
    -export([init/1, resource_exists/2, moved_temporarily/2, previously_existed/2]).
     
    -include_lib("webmachine/include/webmachine.hrl").
     
    init([]) -> {ok, undefined}.
     
    moved_temporarily(ReqData, Context) ->
      Site = wrq:path_info(site, ReqData),
      Location = base64:decode(Site),
      {{true, Location}, ReqData, Context}.
     
    previously_existed(ReqData, Context) -> {true, ReqData, Context}.
     
    resource_exists(ReqData, Context) -> {false, ReqData, Context}.

    The resource is mapped in my dispatch.conf as
    {["redirect", site], redirect_resource, []}.
    You can do a request like http://host.com/redirect/aHR0cDovL2xhbWJkZXIuY29t
    The redirect_resource will interpret the last token in the path as base64 encoded location to redirect to.

    by Daniello at June 17, 2009 12:14 PM

    June 16, 2009

    Jan Lehnardt

    Benchmarks: You are Doing it Wrong

    This is part one in a small series about measuring software performance. There’s a lot of common sense covered, but I feel it is necessary to shed some light.


    Coffee

    Pete needs coffee and his coffee maker broke down. Pete’s browsing through Craigslist. He’s looking for a coffee maker and he’s fine with a used one if he can get it from nearby. While results may vary when Pete’s got his coffee, his brain processes what he sees on a web page in between 200 and 500 milliseconds. Of course this depends on the complexity of the page and outside distractions[citation needed].

    Computers are very limited in what they can calculate but they are incredibly fast and reliable. Humans brains are a lot more sophisticated, but not as fast on raw computations. To render the Craigslist homepage takes about 150ms right now (I’m in Berlin) when I ask curl and it takes Safari around 1.4 seconds (1400ms) to display the page.

    This in part demonstrates the measuring dilemma. Pete never sees the 150ms response for http://craigslist.org/. He only sees that it takes a bit before his browsers finishes loading. We’ll get back to that later.

    The point here is, even if all parts of the system would result in a sub-200ms response time, Pete (and everybody else) would not notice. Pages would change “instantly” as far as he (and everybody else) is concerned. While the fallacies of distributed computing (read: The Internet) will probably never get us there, at some point it does not make any more sense to speed things up because no one will notice.

    Moving Parts

    Lets take a look what a typical web app looks like. This is not exactly how Craigslist works (because I don’t know how Craigslist works), but it is a close enough approximation to illustrate problems with benchmarking.

    You have web server, some middleware, a database. A user request comes in, the web server takes care of the networking and parses the HTTP request. The request gets handed to the middleware layer which figures out what to run; then runs whatever is needed to serve the request. The middleware might talk to your database and other external resources like files or remote web services. The requests bounces back to the web server which sends out any resulting HTML. The HTML includes references to other resources living on your web server, like CSS-, JS- or image files and the process starts anew for every resource. A little different each time, but in general, all requests are similar. And along the way there are caches to store intermediate results to avoid expensive recomputation.

    That’s a lot of moving parts. Getting a top-to-bottom profile of all components to figure out where bottlenecks lie is pretty complex (but nice to have). I start making up numbers now, the absolute values are not important, only numbers relative to each other. Say a request takes 1.5 seconds (1500ms) to be fully rendered in a browser.

    In a simple case like Craigslist there is the initial HTML, a CSS file, a JS file and the favicon. Except for the HTML, these are all static resources and involve reading some data from a disk (or from memory) and serve it to the browser who then renders it. The most notable things to do for performance are keeping data small (gzip compression, high jpg compression) and avoiding requests all together (HTTP level caching in the browser). Making the web server any faster doesn’t buy us much (yeah, hand wavey, but I don’t want to focus on static resources here. Pete wants his coffee. Let’s say all static resources take 500ms to serve & render.

    (Read all about improving client experience with proper use of HTTP from Steve Sounders. The YSlow tool is indispensable for tuning a web site.)

    That leaves us with 1000ms for the initial HTML. We’ll chop off 200ms for network latency [cf. Network Fallacies]. Let’s pretend HTTP parsing, middleware routing & execution and database access share equally the rest of the time, 200ms each.

    If you now set out to improve one part of the big puzzle that is your web app and gain 10ms in the database access time, this is probably time not well spent (unless you have the numbers to prove it).

    Variables

    We established that there are a lot of moving parts. Each part has a variable performance characteristic, based on load, disk I/O, state of various caches (down to CPU L2 caches) and different OS scheduler behaviour based on any input variable. It is nearly impossible to know every interfering factor, so any numbers you ever come up with should be read with a grain of salt. In addition, when my system reports a number of 1000ms and yours reports 1200ms the only thing we can derive from that is our systems are different and we knew that before.

    To combat variables, usually profiles are run multiple times (and a lot of times!) to have statistics tell you the margin of error you’re getting. Profiles should also run a long time with the same amounts of data that you will see in production. If you run a quick profile for a few seconds or minutes, you will hit empty caches and get skewed numbers. If your data does not have the same properties as the data you have in your production environment, you’ll get skewed results.

    Story time: Chris tried to find out how many documents of a certain size he could write into CouchDB. CouchDB has a feature that generates a UUID for every new document you store. The UUID variant it is using uses a full 128 bits of randomness. The documents are then stored in a b+-tree. Turns out that for a b+-tree, truly random keys for any kind of access are the worst possible case to handle. Chris then switched to pre-genereated sequential ids for his test and got a 10x improvement. Now he’s testing the best case for CouchDB which coincides with the application’s data, but your application might have a different key distribution only resulting in a 2x or 5x improvement or none at all.

    In a different case, the amount of data stored and retrieved could easily fit in memory and Linux’ filesystem cache was smart enough to turn all disk access to memory access which is naturally faster. But it doesn’t help if you production setup has more data that fits in memory.

    Take home point: Profiling data matters.

    The second part of this little series will look at pitfalls when profiling storage systems.

    Trade Offs

    Tool X might give you 5ms response times and this is an order of magnitude faster than anything else on the market. Programming is all about trade-offs and everybody is bound by the same laws.

    On the outside it might appear that everybody who is not using Tool X is a moron. But speed & latency are only part of the picture. We already established that going from 5ms to 50ms might not even be noticeable by anyone using your product. The expense for speed can be multiple things:

    • Memory; instead of doing computations over and over, Tool X might have a cute caching layer that saves recomputation by storing results in memory. If you are CPU bound, that might be good, if you are memory bound it might not. A trade off.

    • Concurrency; the clever data structures in Tool X are extremely fast when only one request at a time is processed, and because it is so fast most of the time, it appears as if it would process multiple request in parallel. Eventually though, a high number of concurrent requests fill up the request queue and response time suffers. — A variation on this is that Tool X might work exceptionally well on a single CPU or core, but not on many, leaving your beefy servers idling.

    • Reliability; making sure data is actually stored is an expensive operation. Making sure a data store is in a consistent state and not corrupted is another. There are two trade offs here: Buffers that store data in memory before committing it to disk to ensure a higher data throughput. In case of a power loss or crash (hard- or software), the data is gone. This may or may not be acceptable for your application. The other is a consistency check that is required to run after a failure. If you have a lot of data, this can take days. If you can afford to be offline, that’s okay, but maybe you can’t afford it.

    Make sure to understand what requirements you have and pick the tool that complies instead of taking the one that has the prettiest numbers. Who’s the moron when your web application is offline for a fix up for a day and your customers impatiently wait to get their job done; or worse, you lose their data.

    But…My Boss Wants Numbers!

    Yeah, you want to know which one of these databases, caches, programming language, language constructs or tools are faster, harder, stronger. Numbers are cool and you can draw pretty graphs that management types can compare and make decisions from.

    First thing a good exec knows is that she’s operating on insufficient data (aside, everybody does all the time, but sometimes it is just not apparent to you) and diagrams drawn from numbers are a very distilled view of reality. And graphs from numbers that are effectively made up by bad profiling are not much more than a fairy tale.

    If you are going to produce numbers, make sure you understand how much is and isn’t covered by your results. Before passing them on, make sure the receiving person knows as much.

    A Call to Arms

    I’m in the market for databases and key-value stores. Every solution has a sweet spot in terms of data, hardware, setup and operation and there are enough permutations that you can pick the one that is closest to your problem. But how to find out? Ideally, you download & install all possible candidates, create a profiling test suite with proper testing data, make extensive tests and compare the results. This can easily take weeks and you might not have that much time.

    I would like to ask developers [*] of storage systems to compile a set of profiling suites that simulate different usage patterns of their system (read-heavy & write-heavy loads, fault tolerance, distributed operation and a lot more). A fault tolerance suite should include steps necessary to get data live again, like any rebuild or checkup time. I would like users of these systems to help their developers to find out how to reliably measure different scenarios.

    * I’m working on CouchDB and I’d like to have such a suite very much!

    Even better, developers could agree (hehe) on a set of benchmarks that objectively measure performance for easy comparison. I know this is a lot of work and the results can still be questionable (you read the above part, did you?), but it’ll help our users a great when figuring out what to use.


    Stay tuned for the next part in this series about things you can do wrong when testing databases & k-v stores.

    by Jan (jan@apache.org) at June 16, 2009 04:06 PM

    June 15, 2009

    Debasish Ghosh

    Code Reading for fun and profit

    I still remember those days when APIs were not so well documented, and we didn't have the goodness that Javadocs bring us today. I was struggling to understand the APIs of the C++ Standard Library by going through the source code. Before that my only exposure to code reading was a big struggle to pile through reams of Cobol code that we were trying to migrate to the RDBMS based platforms. Code reading was not so enjoyable (at least to me) those days. Still I found it a more worthwhile exercise than trying to navigate through inconsistent pieces of crappy paperwork and half-assed diagrams that project managers passed on in the name of design documentation.

    Exploratory Code Reading ..

    C++ Standard library and Boost changed it all. C++ was considered to be macho enough those days, particularly if you can boast of your understandability of the template meta-programming that Andrei Alexandrescu first brought to the mainstream through his columns in C++ Report and his seemingly innocuously titled Modern C++ Design. Code reading became a pleasure to me, code understanding was more satisfying, particularly if you could reuse some of those code snippets in your own creations. It was the first taste of how dense C++ code could be, it was as if every sentence had some hidden idioms that you're trying to unravel. That was exploratory code reading - as if I was trying to explore the horizons of the language and its idioms as the experts documented with great care. I subscribed to the view that Code is the Design.

    Collaborating with xUnit ..

    Then came unit testing and the emergence of xUnit frameworks that proved to be the most complete determinants of the virtues of code reading. Code reading changed from being a passive learning vehicle to an active reification of thoughts. Just fire up your editor, load the unit testing framework and validate your understanding through testXXX() methods. It was then that I realized the wonders of code reading through collaboration with unit testing frameworks. It was as if you are doing pair programming with xUnit - together you and your xUnit framework are trying to understand the library that you're exploring. TDD was destined to be the next step, the only change being that instead of code understanding you're now into real world code writing.