Page MenuHomePhabricator

Improve the dump format for STV elections
Open, Needs TriagePublicFeature

Description

The dump format should be standard so that anyone can perform a tally on the data, in a similar way to other voting methods. At the moment the output must be sanitised before it can be effectively tallied by third-parties.

Event Timeline

We need more specification here:

  • Is there a "standard" that we can use?
  • Is there an example of the "sanitation process"?
  • Are there examples of third-parties that do the talling? Maybe there is a tool already available.

FTR: There is some nice test data available https://github.com/dominic998/SecurePoll-Test-Data
Can this be a starting point?

Sorry for my delay as I investigated these questions @Osnard.

As I work on writing these answers on this task, the usefulness of the request actually seems less clear to me. :) I wonder if the core of the problem needing solved here is actually the encryption piece making it technically difficult for observers to run their own tallies... I will leave my comment below as written regardless, but I would like to perform some more investigation into whether we need to continue with this task or just close as invalid.


Before I answer the questions I think we should confirm the source of the problem. Tallying in SecurePoll is possible of course but requires:

  • A dump file
  • Some way to decrypt the dump file (i.e. the keys, which are removed from the dump by default - T297808
  • An installation of SecurePoll (either on votewiki or on a locally-hosted wiki - I guess any Wikimedia wiki could technically do it?)

This is not insurmountable for an observer but does potentially require advanced access that not everyone will have. I can see three possible solutions:

  1. We use a dump format that is accessible and will allow people to tally the file with a script or some tallying tool, be that SecurePoll itself or something like OpaVote
  2. We create a tool to tally the vote somewhere, accessible to any observer
  3. We use a dump format that is human-readable

I like the .blt format used by OpaVote and which is generated by the tool @dom_walden put together but I don't know how it technically differs from the .securepoll format used in SecurePoll (and fundamentally, which one is "better" or more useful).

Thank you very much! This was already very helpful.

I did some additional research, which I want to share here:

Current tools to dump elections
In Extension:SecurePoll, there are three "dump" CLI tools at the moment:

  • dump.php
  • dumpComments.php
  • dumpVoteCsv.php

As far as I can tell, the only one relevant to this topic may be dump.php. It produces a proprietary XML format that looks something like this:

<SecurePoll>
    <election>
        <configuration>
            ...
        <vote>...
        <vote>...
...

T297808 speaks of the need to have GPG keys in the XML, which is apparently already possible by using the --private parameter.

BLT file format
Unfortunately, I couldn't find an "official standard description" for this format. But I found some examples and implementations:

Even though files of this format can be found in the SecurePoll-Test-Data repo, I couldn't find any implementation within Extension:SecurePoll itself.

The format seems pretty simple and apparently is some de facto standard. It looks like a good choice.

Resulting proposal
We should probably create a new CLI tool calles dumpBLT.php within Extension:SecurePoll. In addition, we could create a PHP library (or maybe two) for building/parsing BLT files (e.g. wikimedia/blt-tools, or wikimedia/blt-builderand wikimedia/blt-parser; The "parsing" part does actually not seem to be relevant for this task)

Regarding:

I wonder if the core of the problem needing solved here is actually the encryption piece making it technically difficult for observers to run their own tallies...

Well, the BLT dump could also be made available via the web interface. One would still need to enter a key for decryption, but would be provided with a BLT file rather than with a final tally result. This way People could still tally on their own, using the raw data from the BLT file.

From https://www.mediawiki.org/wiki/Extension:SecurePoll#Tally_a_poll

  • If the poll is encrypted, you may need to enter more information. For example, if you used GnuPG, enter the (private) decryption key and click on the tally button. [...]

I'm delighted to see this task. My professional work includes election auditing, and transparency is of the utmost importance in elections.

I'll clarify that a major motivation for this work is to enable independent observers to:

  • Obtain the cast vote records which indicate how each ballot was marked
  • Exactly reproduce the official results, round-by-round.

The margin of victory in the 2019 ASBS election for two board seats was one ballot. The upcoming 2024 election is via STV with 4 winners, and the more winners there are, the more likely it is that the margin is close. So we should be prepared for a very close margin.

There are a wide variety of STV algorithms and configuration parameters. Any variation in the STV configuration or in which ballots are decided to be eligible may end up changing the outcome. So we need to be very precise about how the eligibility checking and tally will be done, and we need to help observers to exactly reproduce the official results.

It would be best to make it as easy as possible for observers to run the same code on the same ballots, with the same floating-point arithmetic standard etc.

With that said, I like this option from @jrbs

  • We use a dump format that is accessible and will allow people to tally the file with a script or some tallying tool, be that SecurePoll itself or something like OpaVote

I think BLT is a useful standard and a good option, so long as there is a way for folks to feed BLT data into SecurePoll's STVTallier.php code.

I'd love it if observers could do the tallying via the command line on Linux or via a Jupyter notebook that has a PHP kernel.

Okay, so we should do the following:

  1. Allow an export of the full election in BLT format. This will contain all the ballots and is basically the raw data from the election. Ideally we allow this through the web interface (e.g. by creating a _public_ special page that lists all finished elections and a putting a "download BTL" next to each one).
  2. Create a dedicated cli/tallyBLT.php (maybe even use Python to overcome the float precision issue). We make it MediaWiki independent, so this script can be downloaded and run by anyone, anywhere. It will output either JSON, or a human readable plaintext result (or HTML?)
  3. Create a cli/addTallyResult.php that fills the securepoll_properties table with the result from step 2 (this is currently done in cli/tally.php in one step)

Items 2 and 3 basically just split up current cli/tally.php, thus making it more accessible for third-party users.

What do you think?

Update after yesterday's meeting with @jrbs:
We will implement it as @Osnard described it.

Confirming here @Lhuyghe's comment, I think the steps sound very reasonable.

Thanks! I'd be delighted to help test this when you have working code.

One output format to consider would be something that is accepted by RCVis: https://www.rcvis.com/upload.html
Then we get a very nice set of ways to visualize the results.

Offhand the JSON options look easier to parse, and perhaps RCTab format is easier for humans to decipher.

One output format to consider would be something that is accepted by RCVis

~Can you please provide more information? Is there a description of what formats are accepted by RCVis?

  • RCTab format (JSON)
  • Opavote's Online Elections (JSON)
  • ElectionBuddy Preferential Voting (CSV)
  • Dominion's "RCV Short Report" (XLSX)

We should probably use "Opavote's Online Elections (JSON)"

I would propose not to create the independent cli/tallyBLT.php for now in a first step, rather enhance existing cli/tally.php in such way

  • that there is additionally the option to use .blt as input (besides a election name or the dump from cli/dump.php)
  • and additionally the option to get the result as "Opavote's Online Elections (JSON)"

Note: The other output options right now are

  • HTML, which does throw an error on my machine, but i did not yet look into it (PHP Fatal error: OOUI\Exception: OOUI\Theme::singleton was called with no singleton theme set. in /app/wikimedia/vendor/oojs/oojs-ui/php/Theme.php:31)
  • and default: Text, which is actually HTML

The reason for this is, if we create the standalone script without any dependencies, we would need to recreate the STVTallier algorithm. We could not reuse the existing one.
Because we probably do not want two implementations living side by side, we would need to delete the mediawiki-dependent implementation MediaWiki\Extension\SecurePoll\Talliers\STVTallier and only using the one in the cli/tallyBLT.php script.
This is a seperate task from the import .blt, output JSON task.

Another question for the standalone script:
Is every .blt file a STVElection? If not, we would need to implement all other tallier algorithms as well.

Is every .blt file a STVElection? If not, we would need to implement all other tallier algorithms as well.

BLT is often used for IRV elections also. IRV is generally the same as STV with a single winner, as I understand it, so that should come without any extra work. There may be some corner cases, and certainly a plethora of options, but I wouldn't worry about trying to support options we don't use internally. And I don't think we'd need to implement other methods, even though you could certainly use BLT to document cast vote records from any ranked-choice election, and tally it via any of the Condorcet methods, borda count, round-robin, etc.