No Fluff Just Stuff 2013 Day 3:Simulation Testing with Simulant
Simulation Testing with Simulant (Stuart Halloway)
November 10, 2013: 11:00 AM – 12:30 PM
From the conference materials:
Simulation allows a rigorous, scalable, and reproducible approach to testing. The separation of concerns, and the use of a versioned, time-aware database, give simulation great power. This talk will introduce simulation testing, walking through a complete example using Simulant, an open-source simulation library.
Simulation allows a rigorous, scalable, and reproducible approach to testing:
Statistical modeling
Simulation begins with statistical models of the use of your system. This model includes facts such as "we have identified four customer profiles, each with different browsing and purchasing patterns" or "the analytics query for the management report must run every Wednesday afternoon." Models are versioned and kept in a database.
Activity streams
The statistical models are used to create activity streams. Each agent in the system represents a human user or external process interacting with the system, and has its own timestamped stream of interactions. With a large number of agents, simulations can produce the highly concurrent activity expected in a large production system.
Distributed execution
Agents are scaled across as many machines as are necessary to both handle the simulation load, and give access to the system under test. The simulator coordinates time, playing through the activity streams for all the agents.
Result Capture
Every step of the simulation process, including modeling, activity stream generation, execution, and the code itself, is captured and stored in a database for further analysis. You will typically also capture whatever logs and metrics your system produces.
Validation
Since all phases of a simulation are kept in a database, validation can be performed at any time. This differs markedly from many approaches to testing, which require in-the-moment validation against the live system.
Separation of concerns
The separation of concerns above, and the use of a versioned, time-aware database, gives simulation great power. Imagine that you get a bug report from the field, and you realize that the bug corresponds to a corner case that you failed to consider. With a simulation-based approach, you can write a new validation for the corner case, and run that validation against your past simulation results, without ever running your actual system.
This talk will introduce simulation testing, walking through a complete example using Simulant, an open-source simulation library.
From my personal notes:
- opened up with explanation of the Datomic database browser, used to browse database schemas
- why doesn't your testing suite have a database?
- Datomic uses a "persistent data structure" that provides the ability to execute time-based queries
- as a software developer, the speaker argued that this ability has been the most powerful element in his entire career
- what Datomic provides is the ability to go back and query and analyze previous test data results rather than running new tests
- recording events supports the scientific method
- the speaker argued that this what this product provides is similar to time-based source code control, for data
- the type of testing associated with time-based data simulations is referred to as "example based testing" – EBT
- the speaker argued that the human curation piece is what makes this example based
- "model" = kinds of agents, interactions as distributions, stored in a database
- "inputs" = agents, time-stamped action stream stored in a database
- "execution" = driver program, coordinated through a database, maps agents to processes
- "outputs" = system storage, logs, metrics, all put in a database
- what you end up with is 3 databases – a database that serves as a model
of your app, a database of inputs, and a very large database of outputs - "validation" = database queries may be probabilistic
- Simulant is a simulation testing library written in Clojure that uses Datomic
- Datomic itself is written in Clojure
- what you should do is implement software like Simulant in a language of your choice
- Clojure and Datomic were chosen by the speaker because of his familiarity with them
- the financial transactions example the speaker walked through demonstrated examples of "tests", "agents", and "actions"
- the "clock" here is used to change the speed of time during simulations
- "service" = manage lifecycle for external systems – essentially start, stop, etc other systems
- "codebase" – remember your code base – the ability to reproduce experiments using code location, version, etc
- Datomic's query language is "Datalog" – in order to execute queries you
need to understand the query language – constants, variables, etc - "data pattern" – constains the results returned, binds variables
- the name "Datomic" is derived from the fact that it uses "datums"
- "structural rigidity" is exibited by SQL because "select *" from 1
table doesn't really tell you everything – some data is in other tables - the universal data table used here consists solely of "entity",
"attribute", and "value", and is biased toward readers and away from
writers - implicit joins here are not structural in nature – whenever
a variable appears more than once, a join is involved to read all of
them - there is actually a fourth column, "transaction", not being focused on during this part of the presentation
- in queries, a fifth item, "database", also needs to be specified
- during his example, he started by saying that the database used for
testing is Datomic, but the app database also uses Datomic, potentially
making his demonstration confusing at first - the in-memory database option makes initial testing very quick, but is obviously not for production usage
- because Datomic is a functional database, it provides data in a lazy
manner based on what you need, rather than providing everything
available - during his example, the speaker explained that a lot of input is often needed to generate meaningful test results
- "temporal decoupling"
- the speaker moved beyond his familiarity with Clojure and his "personal
pride" using the language to explain the benefits of Clojure - benefits of Clojure – multimethods, sequences, laziness, and agents
- Simulant is only 700 lines of code long, which is why the speaker says it can be replicated by the audience in other languages
- benefits of Datomic – open schema, data log, time model, functional, multi-database queries
- because Datomic queries run within your address space, you can easily execute cross-database joins etc
- the speaker advised not to stop testing you do now until you get good and comfortable with this type of testing
- the speaker claimed that while generative testing can typically start
immediately, comfort with simulation testing will probably take at least
a week - simulation requires time and thought, so you should not use this type of testing with one-off apps or those with minimal users
- in response to a question from the audience, the speaker explained that
he used to advocate that developer teams do the simulation testing, but
now he advocates an independent party, internal or external to the
company - playing back past history is not simulation – it is not simulation until you use a model that has knobs on it