Design Rationale behind the Commons Stack cadCAD refactor

Introduction

Since March 2020 I’ve been working on the Commons Stack’s cadCAD simulation of the Commons, refactoring and redesigning it to become a part functional, part object oriented cadCAD simulation. This post documents the design rationale and thoughts I had in mind for the refactor, and establishes the desired design direction for the future.

What does this simulation simulate

The Commons (longer explanation here)is a group of Participants who can create and vote on Proposals, which receive funding and either fail or succeed. The Participants have vesting/nonvesting tokens depending on when they entered the simulation. The bonding curve controls the price, supply and the funding pool/collateral pool of the Commons. Participants have an individual sentiment, which can affect if they decide to stay in the Commons or exit completely, and if new Participants decide to join.

The original code was written by Blockscience’s resident math genius, Michael Zargham, where the relevant code was in conviction_system_logic3.py, conviction_helpers.py and conviction_cadCAD3.ipynb. I added explanations in my own fork, and started refactoring in coodcad. Later on this was merged with the Commons Stack game server backend repo in commons-simulator.

Participants and Proposals are represented as nodes in a structure depicted below, and the lines (edges) describe their relationships to each other. The general name for this type of data structure is a Directed Graph, where a Graph is basically a collection of nodes, and the “directed” just means that the edges describe a relationship “from this node to that node”. Alternately I just call it the “network”.

In this graph (not a Graph) only the Participants are depicted, with the lines representing their influence on each other.

Just some problems I had with the original code

The code badly needed “models”, like what we have in web server backend programming, and a layer that abstracted away the details of modifying the directed graph.

Or, in other words, we needed a concept of a “thing” instead of a collection of attributes, and the creation of a “thing” should be separated from “how to add that thing to the network” (which wasn’t happening).

Because everything used dictionary attributes, it was impossible for the linter to know anything about the types and thus point out programming errors.

def gen_new_participant(network, new_participant_holdings):
    i = len([node for node in network.nodes])

    network.add_node(i)
    network.nodes[i]['type']="participant"

    s_rv = np.random.rand()
    network.nodes[i]['sentiment'] = s_rv
    network.nodes[i]['holdings']=new_participant_holdings

    for j in get_nodes_by_type(network, 'proposal'):
        network.add_edge(i, j)

        rv = np.random.rand()
        a_rv = 1-4*(1-rv)*rv #polarized distribution
        network.edges[(i, j)]['affinity'] = a_rv
        network.edges[(i,j)]['tokens'] = a_rv*network.nodes[i]['holdings']
        network.edges[(i, j)]['conviction'] = 0
        network.edges[(i,j)]['type'] = 'support'

    return network

As you can see, the attributes that make up a Participant are defined ad-hoc. This part of the code may know that a Participant is supposed to have ‘sentiment’ and ‘holdings’ but what about every other function that deals with Participant? If you decide to add a new attribute to a Participant, you’d have to update the code everywhere else and you don’t know if you did everything correctly (there are no tests), and the linter can’t help you. This is a good case for having a class or at least a struct, to be used as a model/”thing”.

The edges have the same problem, but I didn’t find it worthwhile to change that into a class so I just left them as dict attributes.

In the for loop, you can see that it is simply setting up the relationships between the newly added Participant and the Proposal. This is merely tangentially related to the original task of adding a Participant to the graph, and if one were to make changes to the Participant-Proposal relationship, one would not remember to update this function. This task of ensuring the new Participant has a relationship to every Proposal is best handled by a separate function.

The mechanisms that affect sentiment were spread between several policy and state update functions – it was impossible to keep track of when this variable was touched.

sentiment (the same word was used for overall sentiment as well as individual Participants’ sentiment) is mentioned in gen_new_participant, driving_process, complete_proposal, update_sentiment_on_completion, update_proposals, update_sentiment_on_release, participants_decisions, initialze_network. Notice how the function names imply that sentiment simulation is highly integrated into the simulation and not something you can just “turn on/off”. Furthermore you can’t tell by the name of the function whether it uses sentiment or not. In the long term this is unmaintainable.

Upon trying to add an exit tribute (or anything) to the code, it was not easy to see where it should be added, and if adding it would break something else. This was a natural consequence of all these little problems building up.

To test if the code worked, I had to run the entire simulation. There were no unit tests, no function could run independently of cadCAD.

This also made it impossible to know if everything was working as intended, or if some happy coincidences happened (because Participants’ actions are random) that didn’t crash the simulation this time – or worse, continues running but does something completely unintended.

Even if the simulation worked, I wasn’t confident it was doing what was intended. Yet another natural consequence. In fact I found some things (which I’ve long since forgotten) that weren’t actually used, or working as expected.

Why is the new code designed the way it is?

As you could see already, Participant and Proposal needed to become “objects/things” as opposed to a random collection of attributes (that may not be manipulated consistently by the code). This helps pylint check that you’re manipulating the attributes correctly. Let the computer check for programming errors as much as possible with the linter so that I don’t have to run the simulation to see if things work or not.

When you start thinking about Participants and Proposals as things, you can start to imagine how you could, instead of a function that says “randomly do this action 30% of the time”, you could simulate Participants as individuals: “randomly do this action x% of the time based on this Participant’s personality, which can be profit-seeking/altruistic/balanced”.

The ultimate goal is to put a neural network in the Participant class, tell it to optimize for profit/self satisfaction, and see what behaviour emerges.

TokenBatch badly needed to be a class in order to implement token vesting. A TokenBatch keeps track of vested and unvested tokens, and how many of the vested tokens were spent once unlocked.

Implementing TokenBatch as a class enabled lots of convenience functions that would simplify code using it.

AugmentedBondingCurve, Commons followed the same logic: once I make changes to something, I really don’t want to think about the details of what’s going on under the hood. I just want it to work. Plus having them as “things” fits how people naturally think about the concepts.

It was quite important that the structure of the code be self-explanatory and not require prior understanding. Having the Commons as a class, a thing, even though it meant running some extra state update functions to copy token_supply, funding_pool, collateral_pool out of it so that cadCAD could access them easily, facilitated this.

At the same time, the functional principles (it should be easy to see that a function cannot cause unintended side effects; functions are things that modify data, which remains unchanged otherwise) that cadCAD was based on are still valuable.

Classes should be used to abstract complexity away and fit the code to how a normal human might think of the concepts. For everything else, stick to the functional paradigm.

Walking Through The Codebase

simulation.py is where the cadCAD simulation is setup. simrunner.py, simrunner.ipynb are simply frontends to run this from the CLI or Jupyter notebook.

policies.py: Here live all the policies. They are just simple functions, but there are so many of them, so they’re grouped under classes with @staticmethod. When writing or changing them, do create a corresponding unit test in policies_test.py.

The idea is if you decide one day that there should be a different way of deciding whether a new Participant joins the Commons, you create a p_desc_of_your_strategy_here() policy under GenerateNewParticipant. Please, don’t just edit p_randomly().

entities.py: Proposal and Participant live here.

Normally in cadCAD system-dynamics style simulations the policy functions decide what happens in the simulation, but the code here is more of an ‘agent based modeling’ simulation so the policies also ask the Proposals and Participants what they will do.

System dynamics style: There is a 30% chance of a new Proposal being created every round. Assign it to a random Participant.

Agent based style: Each Participant is asked if they want to create a new Proposal – they have a 30% chance of saying yes.

Since Proposal, Participant have tunable behaviour and thus need configuration parameters, but cannot (or maybe, should not? – because the concept of a Proposal/Participant is unrelated to cadCAD in general) know about the existence of cadCAD and its params variable, their tunable constants are defined in a different place, config.py.

I mentioned before that one reason to have classes is to allow the linter to introspect operations and warn us if we’re doing something stupid. In practice this didn’t happen so much because we still have to use network.nodes[0]["item"] which could be anything. It could be useful to replace all instances of network.nodes[0]["item"] with a function from the data layer called get_participant(idx: int) -> Participant, which will make it clear to the linter that we are going to operate on a Participant now.

network_utils.py: This is the data layer. It translates business logic operations to network.DiGraph operations.

Simply put, when we want to add a participant to the network, that’s all we want to think about – we don’t want to have to remember “this is how to use network.DiGraph to accomplish this; oh and remember to setup the influence and support edges for this new Participant”.

As time goes by this has become a huge collection of assorted functions which has become a pain to import everywhere – I’ve been considering rolling them into a class with methods, but I need to see how much it disturbs the cadCAD functional paradigm. For now it’s not such a big problem. As mentioned before, it could use a simple get_participant(idx: int) -> Participant type of function which not only informs the linter, but does a type check so it can guarantee we’re not actually operating on a Proposal.

Quirk 1: object oriented weirdness in simulation.py

initial_conditions = {
        "network": network,
        "commons": commons,
        "funding_pool": commons._funding_pool,
        "collateral_pool": commons._collateral_pool,
        "token_supply": commons._token_supply,
        "policy_output": None,
        "sentiment": 0.5
    }

This is the initial state condition of the simulation, which is available later in the state update functions as s. As you can see, three of these state variables are actually copied from commons, and if we were to proceed as normal like in most cadCAD simulations, these variables would get out of date. Which is why whenever commons changes, we need to copy them out from commons into s with these state update functions – notice how update_avg_sentiment() is similar.

def update_collateral_pool(params, step, sL, s, _input):
    commons = s["commons"]
    s["collateral_pool"] = commons._collateral_pool
    return "collateral_pool", commons._collateral_pool

def update_token_supply(params, step, sL, s, _input):
    commons = s["commons"]
    s["token_supply"] = commons._token_supply
    return "token_supply", commons._token_supply

def update_funding_pool(params, step, sL, s, _input):
    commons = s["commons"]
    s["funding_pool"] = commons._funding_pool
    return "funding_pool", commons._funding_pool

def update_avg_sentiment(params, step, sL, s, _input):
    network = s["network"]
    s = calc_avg_sentiment(network)
    return "sentiment", s

Quirk 2: two state update functions depend on the output of one policy but change the same variable

    {
        "policies": {
            "which_proposals_should_be_funded": ProposalFunding.p_compare_conviction_and_threshold
        },
        "variables": {
            "network": ProposalFunding.su_make_proposal_active,
            "commons": ProposalFunding.su_deduct_funds_from_funding_pool,
            "policy_output": save_policy_output,
        }
    },
    {
        "policies": {},
        "variables": {
            "network": ParticipantExits.su_update_sentiment_when_proposal_becomes_active,
        }
    },

The decisions made in ProposalFunding.p_compare_conviction_and_threshold are needed by ProposalFunding.su_make_proposal_active and ParticipantExits.su_update_sentiment_when_proposal_becomes_active, which both update the same state variable, network. This makes things awkward, because even though I could combine them into one function, the truth is they are separate mechanisms and this would make the code untidy.

The solution is to save the output of the policy into the state variables dict s instead, so that the result of ProposalFunding.p_compare_conviction_and_threshold is still available outside of that state update block.

def save_policy_output(params, step, sL, s, _input):
    return "policy_output", _input

Summary and Intentions

All code documentation runs the risk of being quickly outdated – but the reason I wrote this post was to keep the original principles behind the code clear even as it changes in the future. Plus, a snapshot in time that explains context is always useful.