Skip to content

Spike submissions stats#2766

Draft
thomasiles wants to merge 4 commits into
mainfrom
spike-submissions-stats
Draft

Spike submissions stats#2766
thomasiles wants to merge 4 commits into
mainfrom
spike-submissions-stats

Conversation

@thomasiles
Copy link
Copy Markdown
Contributor

Spike for showing total submission stats

Trello card: https://trello.com/c/usentmeF/2918-timebox-2-days-spike-to-add-submission-data-into-the-admin-reports

This spike explores creating a report with submissions for a set period, adding manually entered baseline to results.

There are two big issues which prevent this feature being used as the source of truth for our submission stats.

  • Our stats aren't saved for longer than 15 months
  • The stats include test submissions, which make them much less useful

To solve the issue with stats only going back 15 months, we could:

  • Start storing the daily total submissions. We could use a batch job to fetch them from AWS and then calculate totals using the stored stats.
  • Only show stats for fixed time periods, e.g. the last 6 months, avoiding the need to store them but not showing a total.
  • Keep a baseline number to add to the rolling stats. We would need to periodically update the figure. This would allow us to show total stats but at the cost of needed to update the figure. This is the approach this spike took.

The second issue, test submissions included in our the stats is harder. I think the easiest solution for this is to add another dimension to our submission stats, organisation. We could then exclude a list of organisations which we know produce test stats. This would be the org our end-to-end tests use and the internal forms team.

This spike doesn't do that.

How we store submission counts

We store submission counts using AWS CloudWatch metrics. Every time a form is submitted we add a count for <formid, environment>.

These metrics are stored for a maximum of 15 months at a granularity of 1 day.
We replay these metrics to form creators on the "live" view of a form in Admin.

We don't store these metrics in our database.
Every time a submission total for a form is displayed we are issuing a query to AWS.
This keeps the code simple and ensures the data is always fresh.
But it stops us showing stats for anything beyond 14 months ago.
It also means the stats could go down as well as up.
This makes keeping an overall total based on these results harder.

Showing totals

To show a total submissions, we can query AWS to get a sum of the metrics for all forms in an environment for each day for the last 12 months (or anything less than 15 months).
This is slow but lets us calculate lots of statistics, like daily, weekly and monthly submissions as well as partial counts for the current day, week, and month.

There is a problem with this approach.
The query returns all submissions which includes the submissions for test forms.
Our end-to-end tests create and submit a large number of forms. Over 1000 a month on dev.

To calculate the current figures, Anne queries Splunk and uses set queries to remove the values based on the titles of the test forms.

Things to consider when reviewing

  • Ensure that you consider the wider context.
  • Does it work when run on your machine?
  • Is it clear what the code is doing?
  • Do the commit messages explain why the changes were made?
  • Are there all the unit tests needed?
  • Do the end to end tests need updating before these changes will pass?
  • Has all relevant documentation been updated?

Add two new settings for use when calculating the total submissions.

total_submissions_baseline is number of submissions before the baseline cut off date.

To calculate a total submission figure, over the lifetime of the
service, we need baseline to start from. This is for a few reasons:

- CloudWatch only retains data for a maximum of 15 months so we can't
  query it for all data.
- We are only using the new CloudWatch metric name. The old name will
  pass through the retention window in July so it doesn't seem worth
  including in our stats.
- We haven't always used CloudWatch so we need to add in the stats
  collected before it was available.

These are settings because that seemed like the easiest way for us to
store and update. The setting is not scoped per environment so it should
only be set to the value for production. This is a limitation we might
want to change in the future.
To show submission metrics we query CloudWatch for all submission data
between the start of the baseline period and the current time.

For this service to work, we need to ensure that we have permission to
run access cloudwatch:GetMetricData. This can be set in the ECS Iam
policy alongside `cloudwatch:GetMetricStatistics`.

We get back an array of datapoints, one for each day in the period.

We then use these values to calculate daily, weekly, monthly and yearly
values.
Add a new value to the features report for the live-or-archived tag.

It shows the total number of submissions.
Add a new report which shows stats for submissions.
@github-actions
Copy link
Copy Markdown

🎉 A review copy of this PR has been deployed! You can reach it at: https://pr-2766.admin.review.forms.service.gov.uk/

It may take 5 minutes or so for the application to be fully deployed and working. If it still isn't ready
after 5 minutes, there may be something wrong with the ECS task. You will need to go to the integration AWS account
to debug, or otherwise ask an infrastructure person.

For the sign in details and more information, see the review apps wiki page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant