Summary of the bar camp Session at the 2nd lightninghackday in Berlin: Improving the Autopilot
I have been visiting Berlin to attend the second lightninghackday and want to give a brief wrap up about the event. This article will basically cover two topics. 1st as promised within my bar camp session on “Building an automated topology for autopilot features of the lightning network nodes” I will give an extensive protocol / summary of the session itself. 2nd I will talk about an already well known technique called splicing which I realized during the event might be one of the more important yet unimplemented features of lightning nodes.
Let me just make a quick shootout to Jeff and his team from fulmo lightning: Thanks for organizing this excellent event. Especially bringing together such a group of high profile lightning network enthusiasts was astonishing for me. Out of the many tech events and conferences that I have attended in the past this was one of my top experiences.
In my bar camp session we had roughly 30 people attending. Luckily also Conner Fromknecht and Olaoluwa Osuntokun from the San Francisco based lightning labs joined the session and gave many insights resulting from their hands on experience. I started the session with a short presentation of my thoughts about the problem. I had previously formulated those in my github repo as a rough draft / sketch for a whitepaper on the topic. That happened after I opened Issue 677 in the LND Project criticizing the current design choices for the autopilot. The main points of those thoughts are a list of metrics and criteria I previously thought I would want to monitor and optimize in order to find a good network topology. Before the hackday that list looked like that. (We discussed this list and basically people agreed on it):
- Diameter: A small diameter produces short paths for onion routing. Short paths are preferable because failure is less likely to happen if less nodes are involved for routing a payment.
- Channel balance: Channels should be properly funded but also the funds should be balanced to some degree.
- Connectivity / Redundancy: Removing nodes (in particular strongly connected nodes) should not be a problem for the remaining nodes / connectivity of the network.
- Uptime: It seems obvious that nodes with a high uptime are better candidates to open a channel to.
- Blockchain Transactions: Realizing that the Blockchain only supports around 300k Transactions per day, the opening, closing and updating of channels should be minimized.
- Fees for routing: Maybe opening a channel (which is cost intensive) is cheaper overall.
- Bandwidth, Latency,…: nodes can only process a certain amount of routing requests. I assume that also the HTLCs will lock channels for a certain amount of time during onion routing.
- Internet topology: obviously routing through the network becomes faster if the P2P network has a similar topology as the underlying physical network. Also it makes sense since even on the internet people might most of the time use products and services within their geographic region. check assumptions
Before I state the new ideas that came from the attendees and the discussion I want to briefly sum up the great keynote by the guys from lightning labs that preceded the bar camp session. In particular I think the insights and techniques (Atomic Multi Path routing and Splicing). What they talked about have a huge impact on the autopilot and the topology generation problems of the lightning network. Long story short the magic concept in my opinion is splicing.
For those that are unfamiliar with the topic: Splicing is the process of updating the channel balance of a payment channel by adding or removing (partial) funds. In the past I always thought that even though it was not implemented in any of the lightning nodes that this is a problem which technically is rather trivial to solve and thus of minor importance. The guys from lightning labs basically stated the same emphasizing that splicing funds out of a channel is trivial and can even be achieved easily in a non blocking way such that the channel can be used again directly after splicing out even if the spent transaction is not yet mined. However splicing in (aka adding more funds to a channel) seems to be a little bit more cumbersome. Without going into too many technical details the real eyeopener for me was the fact that splicing (together with the autopilot) seem to make lightning wallets in the way they exist these days obsolete. This obviously is a game changer. So to make it crystal clear:
With splicing and the autopilot implemented in a standard bitcoin wallet (and running in the background without the need for users to be aware of this) users can efficiently, quickly and cheaply send funds from one to another. If a path via lightning exists the transaction will be lightning fast. If no path can be found one could just splice out some funds from existing channels to create a new channel for the transaction. This would basically boil down to a common on chain transaction which happened before we had the lightning network anyway. However it doesn’t matter if all funds are currently frozen up in payment channels. Also it doesn’t waste the blockchain transaction but rather uses this opportunity to open a funding transaction for the next channel increasing the size of the lightning network. I dare to say a bigger lightning network is a better lightning network in general. Basically the goal would be that eventually all bitcoin funds would be locked in some payment channels (which with splicing obviously doesn’t lower the control or flexibility of the user). In case a user really needs to do a transaction wich can’t be achieved via lightning it will just be spliced out and takes as much processing time as a regular on chain transaction. As a disclaimer: Obviously watchtowers 1 are still needed in particular in this scenario in which users might not even realize they are using the lightning network.
Taking the opportunities of splicing into consideration I think that the topology problem of the autopilot becomes issue of only minor importance. One can easily splice out from existing payment channels to new payment channels if needed. The only bad thing is that such a transaction is not routed at lightning speed but rather takes a couple block times to be mined and processed. However it eventually creates a user generated network topology that hopefully pretty much follows actual flows of funds and would thus be rather robust. The only drawback with such a process would be that transactions frequently include channel creations which takes some time and that only a maximum of 300k channels can be altered per day on top of the current bitcoin protocol. This observation explains why topology generation of the autopilot still is a reasonable topic to think about since it should still help to move traffic away from the blockchain.
Finally I will now list some new thoughts that have been collected during the session. I will also update my whitepaper soon. Feel free to fork me on github and do a pull request in order to fix mistakes or add your own thoughts.
Number of nodes reachable within x hops:
It was pointed out that this metric would look quite nice. As a result of the discussion we came to realize that this greedy heuristic would basically lead to the scenario in which every node would open a channel to the most central node in the network. Such a central node would increase the number of nodes that can be reached within x hopes by the maximum amount. Still it looks like an important number to somehow optimize for.
Honesty and well behavior of nodes:
Following a similar philosophy we discussed weather a distributed topology creation algorithm should aim for global health of the network in comparison for greedy strategies in which every node tries to optimize their own view of the network. Though it wasn’t pointed out in the session I think that a strategy where every node tries to optimize their own access to the network will at some point yield a Nash equilibrium which. With my rather little understanding of game theory I think this might not necessarily be the best solution from a global perspective. Also we discussed that in the later mentioned sections where clients share information with neighbors an algorithm must be robust against fraudulent behavior or lying of nodes.
Pretty much everyone agreed right away that different nodes might have different needs for the lightning network. So the topology creation should be configurable (or learnable by the node) taking into respect whether the node is just a casual consumer or a shop, bank, exchange …
Privacy vs information sharing:
We discussed quite extensively that for a distributed algorithm to make predictions for which channels should be created it would be great if channel balances would be public (or at least there would be some rough information available about the distribution of the balance within one channel). We realized that as a first step following the ideas of lnd Issue 1253 nodes should start collecting historic information about their own operations and routing acticities. Actually I just saw that a pull request that claims to have resolved issue 1253 already exists. We also realized that channel fees might act as a reasonable well proxy for the channel balance. Assume Alice and Bob have a channel and the balance is very skew in the sense that Alice has almost no funds and Bob has all of them. If Bob was asked to route a payment through that channel he would probably charge a smaller fee than Alice if she was asked to route a payment through her almost dried up channel.
Still the idea circulated around that nodes could share their channel balances with neighbors in a similar fashion how routing information in the IP network are shared with neighbors. In such a way eventually a map of paths would be constructed for each node.
A point mentioned – that in my opinion is important but only somewhat related to these issues – was the fact that of course nodes should take higher routing fees if the timelock of the routing request is high since in the worst case this makes a channel or many other paths unusable for quite some time maybe even without a payment taking place. As a side note I just realized that this could be a more sophisticated strategy for nodes to estimate their fees if they are aware of the number of routing paths their channel makes possible.
Some technical details:
One could use the number of disjoint paths between nodes as a good criteria since it also enables heavy use of atomic multi path transactions. Also it was mentioned that one could look at the AS-number of the underlying internet hosts.
Why not using machine learning instead of trying to find some smart algorithm / heuristics?
For me the most surprising idea was the fact that this entire autopilot problem could easily be transferred into a machine learning problem. There are some obvious flaws because single nodes might not have enough training data and if one has enough data sharing the model would probably also not work out of the box. So we discussed a little bit if that would be a use case for transfer learning. Here I will not dig deeper into the topic, since the article is already quite long. But working in the field of machine learning and being a data scientist and having not even put the slightest thought about this idea before the event took place was a little bit shocking and surprising for me.
Anyway I hope my summary of the session will be useful for you and the community! My personal roadmap now consists of four things.
I am thinking to add a splicing protocol specification to the BOLT (lightning-rfc)
I want to get running with go-lang and the codebase of lnd in order to be able to do some hands on experiments.
I plan to update the very rough draft of the white paper.
Finally I will hopefully find the time to hack a little python script that does a simulation of how my above described splicing strategy would create a lightning network wich is able to route most payment requests. If you want to be update just follow me on twitter where I will inform you once I am done. Also feel free to leave a comment with more ideas or extend the draft of the white paper. I would love join forces working on this topic.
Also kudos to Susette Bader who took this really nice snapshot and cover image of this post while we have been hacking.
- A channel stays open until closed. If your LN wallet goes offline, the channel stays open. The problem when you are offline is that the other party can broadcast some earlier channel state, that is in her benefit. If you are not watching the blockchain for this fraudulent transactions yourself, you have to hire some service that will do it for you, the watchtowers.
Support us and the authors of this article by donating to the following address:34NnW6ETxT7MEw1cJQFmaVgVmL6VQMC4qU