Spike submissions stats#2766
Draft
thomasiles wants to merge 4 commits into
Draft
Conversation
Add two new settings for use when calculating the total submissions. total_submissions_baseline is number of submissions before the baseline cut off date. To calculate a total submission figure, over the lifetime of the service, we need baseline to start from. This is for a few reasons: - CloudWatch only retains data for a maximum of 15 months so we can't query it for all data. - We are only using the new CloudWatch metric name. The old name will pass through the retention window in July so it doesn't seem worth including in our stats. - We haven't always used CloudWatch so we need to add in the stats collected before it was available. These are settings because that seemed like the easiest way for us to store and update. The setting is not scoped per environment so it should only be set to the value for production. This is a limitation we might want to change in the future.
To show submission metrics we query CloudWatch for all submission data between the start of the baseline period and the current time. For this service to work, we need to ensure that we have permission to run access cloudwatch:GetMetricData. This can be set in the ECS Iam policy alongside `cloudwatch:GetMetricStatistics`. We get back an array of datapoints, one for each day in the period. We then use these values to calculate daily, weekly, monthly and yearly values.
Add a new value to the features report for the live-or-archived tag. It shows the total number of submissions.
Add a new report which shows stats for submissions.
|
🎉 A review copy of this PR has been deployed! You can reach it at: https://pr-2766.admin.review.forms.service.gov.uk/ It may take 5 minutes or so for the application to be fully deployed and working. If it still isn't ready For the sign in details and more information, see the review apps wiki page. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Spike for showing total submission stats
Trello card: https://trello.com/c/usentmeF/2918-timebox-2-days-spike-to-add-submission-data-into-the-admin-reports
This spike explores creating a report with submissions for a set period, adding manually entered baseline to results.
There are two big issues which prevent this feature being used as the source of truth for our submission stats.
To solve the issue with stats only going back 15 months, we could:
The second issue, test submissions included in our the stats is harder. I think the easiest solution for this is to add another dimension to our submission stats, organisation. We could then exclude a list of organisations which we know produce test stats. This would be the org our end-to-end tests use and the internal forms team.
This spike doesn't do that.
How we store submission counts
We store submission counts using AWS CloudWatch metrics. Every time a form is submitted we add a count for <formid, environment>.
These metrics are stored for a maximum of 15 months at a granularity of 1 day.
We replay these metrics to form creators on the "live" view of a form in Admin.
We don't store these metrics in our database.
Every time a submission total for a form is displayed we are issuing a query to AWS.
This keeps the code simple and ensures the data is always fresh.
But it stops us showing stats for anything beyond 14 months ago.
It also means the stats could go down as well as up.
This makes keeping an overall total based on these results harder.
Showing totals
To show a total submissions, we can query AWS to get a sum of the metrics for all forms in an environment for each day for the last 12 months (or anything less than 15 months).
This is slow but lets us calculate lots of statistics, like daily, weekly and monthly submissions as well as partial counts for the current day, week, and month.
There is a problem with this approach.
The query returns all submissions which includes the submissions for test forms.
Our end-to-end tests create and submit a large number of forms. Over 1000 a month on dev.
To calculate the current figures, Anne queries Splunk and uses set queries to remove the values based on the titles of the test forms.
Things to consider when reviewing