Skip to content

URW-178-API caching#158

Merged
sebastian-king merged 19 commits intodevelopfrom
URW-178-cache
Nov 17, 2024
Merged

URW-178-API caching#158
sebastian-king merged 19 commits intodevelopfrom
URW-178-cache

Conversation

@Kenneth-W-Chen
Copy link
Copy Markdown
Member

  • Created 2 tables to cache API request responses and a config for each cache
    • So far, only used for Printful API (excludes orders)
  • Current PHP functions let you get cached content from the table, add stuff to the cache, clear out the cache (or remove one item from the cache), and add/update cache configs

@Kenneth-W-Chen Kenneth-W-Chen self-assigned this Jun 28, 2024
Comment thread api/printful/printful.php Outdated
Comment thread api/printful/printful.php Outdated
$isOrder =substr($URI, 0, 6) == 'orders';
if(!$isOrder){ // check cache, ignore orders because they shouldn't be cached
$r = getCached($endpoint);
if($r!== null){ // endpoint is in cache
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like for the api-cache to be able to handle the cache checking, returning and insertion without needing to implement this everywhere we use the cache. Could the getCached method take a instance of $ch here where the client just sets the curlopts and it sends the request if needed?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in 7cb4d48. Only caveat is that I made a new object to encapsulate some cURL info and return the object instead of just the content. I only did that because the PrintfulCustomAPIExceptions checked the HTTP response code, which isn't 200 if a cached item is returned (because curl_exec wasn't called).
I can have it pass the result only if we switch to throwing generic exceptions instead. Or we could pass the exception type name as a string in the parameter? Let me know if you want one of these instead of the current implementation

Comment thread api/printful/printful.php Outdated
Comment thread api/api-cache.php
Comment thread api/api-cache.php Outdated
Comment thread api/api-cache.php Outdated
Comment thread api/api-cache.php Outdated
if ($r) {
$id = $r['id'];
$queryString = 'UPDATE api_cache
SET last_successfully_retrieved = UTC_TIMESTAMP,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be able to get fancy if you want and set the DB column to auto update the last_successfully_retrieved timestamp each time the content column updates -- not required at all, just in case you wanted to experiement with that

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't get ON UPDATE to work with UTC_TIMESTAMP. Apparently, it's a bug https://bugs.mysql.com/bug.php?id=103228

Comment thread api/api-cache.php Outdated
Comment thread api/api-cache.php Outdated
if ($config_name) {
return $db->query('INSERT INTO outgoing_request_cache_config (ttl, config_name) VALUES (' . $ttl . ', "' . $db->real_escape_string($config_name) . '")');
}
return $db->query('INSERT INTO outgoing_request_cache_config (ttl) VALUES (' . $ttl . ')');
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config_name is always required, we should not allow it to be inserted empty, in fact, we should update the SQL to make it a non nullable column

Comment thread api/api-cache.php Outdated
Comment thread api/api-cache.php Outdated
Comment thread api/api-cache.php Outdated
Comment thread api/api-cache.php Outdated
$configId = $db->query('SELECT * FROM outgoing_request_cache_config WHERE config_name = "' . $db->real_escape_string($config) . '"')->fetch_array(MYSQLI_ASSOC)['id'];
}
$queryString = "INSERT INTO api_cache (endpoint, last_successfully_retrieved, last_attempted_retrieval, config_id, content)
values ('" . $db->real_escape_string($endpoint) . "', UTC_TIMESTAMP,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm used to using CURRENT_TIMESTAMP for things like this, out of curiosity, is that just an alias for UTC_TIMESTAMP or is there actually a timezone difference between those two built in methods?

Copy link
Copy Markdown
Member Author

@Kenneth-W-Chen Kenneth-W-Chen Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CURRENT_TIMESTAMP returns the timestamp in the server's timezone. I was using UTC since I wasn't sure what the timezone on the server was, and I needed UTC since time() in PHP is a Unix timestamp (which is UTC).
For similar reasons, I was appending UTC to the retrieval time when converting last_successfully_retrieved to a Unix timestamp. I think strtotime() uses the server's timezone by default

Comment thread api/printful/printful.php Outdated
@@ -1,5 +1,5 @@
<?php

require(BASE . '/api/api-cache.php');
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a guarantee that BASE is defined here? I think you might have just been lucky that BASE is also provided upstream, and you should actually be importing the template/top first here.
FYI, php will only re-import the template/top here if it wasn't already imported upstream.

Comment thread api/api-cache.php Outdated
}
return false;
}
return null;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh the beauty of PHP's dynamic and mixed types. I miss it.

Comment thread api/api-cache.php Outdated
$config = $db->query('SELECT * FROM outgoing_request_cache_config WHERE id = ' . $r['config_id'])->fetch_array(MYSQLI_ASSOC);
$ttl = $config['ttl'];
$now = time();
$retrievalTime = strtotime($r['last_successfully_retrieved'] . 'UTC');
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've probably already tested this but out of interest, what happens here if/when last_successfully_retrieved is null?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It resolves to the current time. I changed it in 7cb4d48

Comment thread api/api-cache.php
Comment thread sql/outgoing_request_cache.sql Outdated
PRIMARY KEY (id),
FOREIGN KEY (config_id) REFERENCES outgoing_request_cache_config(id)
)AUTO_INCREMENT=8 DEFAULT CHARSET=latin1;
)AUTO_INCREMENT=8 DEFAULT CHARSET=utf8;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the symbols in some Printful API responses are UTF-8. One of the errors lists \\xE2\\x80\\xB3\\xE2\\x80\\x93... as an issue. It might actually be a different charset, but given that it's not throwing errors over it anymore, I think UTF-8 will be fine.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP is always ISO-8859-1 charset, we can transform that into whatever client side charset we feel like. Latin1 should be fine, but if not we can update it to utf-8 if there's a specific problem.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue was inserting response contents into the table. MySQL was giving an "incorrect string value" error for some products. The character I mentioned above is the double prime character which I don't think is present in ISO-8859-1. It's in the product description for every hat product

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, the curl library must be converting to UTF-8 instead of the standard ISO-8859-1

Comment thread api/api-cache.php Outdated
require_once('../template/top.php');

// RegEx to match certain endpoints we want to ignore, e.g., orders
const IGNORED_ENDPOINTS = '/(?:orders)/i';
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to have ignored endpoints, each entry in the db table would be a specific endpoint so if we want to ignore it we just won't make an entry for it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe the confusion here is that you have the code adding a cache entry automatically if its not found, we should not do this. Cache details need to be manually added to the database, if they don't exist and the retrieve cache method is called for that endpoint, we should just throw an error.

If we allow any call to create a cache entry, we can't enforce sensible TTLs, we'll get erroneously created entries, each time an endpoint or API base changes then suddenly we'll get duplicate config entries. We should stay on top of these things manually.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted in c289fad, specifically, here (line 73) and here (line 87)

Comment thread api/api-cache.php Outdated
/**
* @var bool $curl_executed true if curl_exec was called, false otherwise
*/
public $curl_executed;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curl_executed is a bad variable name, it could suggest executed is true anytime there isn't an error for example and in the future we may not stay with curl, be as verbose and specific as possible: perhaps, fetched_new_content or something

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in c289fad. Also changed the respective parameter in the constructor

Comment thread api/api-cache.php Outdated
}
$result = curl_exec($ch);
// add to cache if no errors and HTTP OK
if (!curl_errno($ch) && curl_getinfo($ch, CURLINFO_HTTP_CODE) == 200) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

success error codes are anything between 200 and 299, inclusive. for example, 201 and 204 are common success codes alongside 200

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread api/printful/printful.php Outdated
class PrintfulCustomAPI {
private $api_key;

private const PRINTFUL_CACHE_CONFIG_ID = 1;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config ID should be unnecessary in the code, we can look everything up using the endpoint URL.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in c289fad. Using the retrieved config ID on line 82

Comment thread sql/outgoing_request_cache.sql Outdated
PRIMARY KEY (id),
FOREIGN KEY (config_id) REFERENCES outgoing_request_cache_config(id)
)AUTO_INCREMENT=8 DEFAULT CHARSET=latin1;
)AUTO_INCREMENT=8 DEFAULT CHARSET=utf8;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP is always ISO-8859-1 charset, we can transform that into whatever client side charset we feel like. Latin1 should be fine, but if not we can update it to utf-8 if there's a specific problem.

Comment thread api/printful/printful.php Outdated
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);

$result = curl_exec($ch);
$cache_result = get_valid_cache_entry($endpoint, $ch, $this::PRINTFUL_CACHE_CONFIG_ID);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the config ID being provided here, this is exactly what I was looking for 👍

Comment thread api/api-cache.php Outdated
*
FROM
api_cache
JOIN
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might need to be a LEFT JOIN, a standard JOIN I think will return nothing at all if there's no entry in the api_cache table but there is an entry in the outgoing_request_cache_config table (i.e. JOIN expects both to exist to return anything, but the first time a cache is ever called there won't be an api_cache entry just yet)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in c289fad

Comment thread api/api-cache.php
if ($retrievalTime !== null && $now - strtotime($retrievalTime . 'UTC') < $ttl) {
return new CacheResult($r['content']);
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if $q is false we should probably throw an error

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in c289fad here. Would it be better to return null or treat it as though the endpoint wasn't cached in the first place (i.e., send the cURL request and return that result)?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um, yeah I'd say return null and error log for now.

@sebastian-king
Copy link
Copy Markdown
Member

Couple of general notes:

  1. I can't actually approve this PR because it's still in draft status
  2. You have a mixture of camel case and snake case in your variables names, we use snake case for method and variable names and pascal case for class names here

Comment thread api/printful/printful.php Outdated

curl_setopt($ch, CURLOPT_URL, 'https://api.printful.com/' . $URI);
$endpoint = 'https://api.printful.com/' . $URI;
curl_setopt($ch, CURLOPT_URL, $endpoint);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems tautological to pass $endpoint into the get_valid_cache_entry method and also set the endpoint before passing this in, we should probably just move this inside of the method.

@Kenneth-W-Chen Kenneth-W-Chen marked this pull request as ready for review August 15, 2024 00:30
@sebastian-king
Copy link
Copy Markdown
Member

This PR looks good now, can you include the insert SQL for the creation of the configs for printful?

@Kenneth-W-Chen
Copy link
Copy Markdown
Member Author

Should I put them in the same file as the table creation or make a separate cache_configs.sql file?

Also what would be a good TTL?

@sebastian-king
Copy link
Copy Markdown
Member

It should go in the sql/migrations/ directory with a filename like URW-<ticket num>-cache-configs.sql
For printful I'd go with a TTL of 24 hours

@sebastian-king
Copy link
Copy Markdown
Member

Oh, I see now that endpoint should be part of the config, not the cache content table. We should have a config entry for each individual endpoint we intend to use.

@Kenneth-W-Chen

This comment has been minimized.

Changed some SELECTs to grab fewer columns. Added newlines and indents to queries in PHP file to improve readability
@Kenneth-W-Chen

This comment has been minimized.

Comment thread api/api-cache.php Outdated
Rearranged some columns, specified default for columns of type timestamp because of MySQL server settings
Parameterized endpoint strings and added helper function to convert parameterized endpoint strings to the full version
Add varargs for send_request and api-cache functions
Reformatted code
@Kenneth-W-Chen
Copy link
Copy Markdown
Member Author

Kenneth-W-Chen commented Sep 1, 2024

60c91fe I changed content's type to mediumtext because text was too small for the Gear page, Specifically, the "Aerospace Division T" is over 200kB. No other product has this issue except for that one.

The specific cause is because the product has 28 variants, and Printful returns info for each variant for the single request. The data is repetitive (e.g., each variant lists if it's available in the US, UK, etc.). I can compress it into a normal zip down to 10kB or into a .7z to about 8kB.

Edit: Changed it to compress the data in 8c98739 since 20-200kB of JSON isn't human readable (or human parsable) in the first place

@Kenneth-W-Chen Kenneth-W-Chen mentioned this pull request Sep 2, 2024
13 tasks
@sebastian-king sebastian-king merged commit e4971cf into develop Nov 17, 2024
@Kenneth-W-Chen Kenneth-W-Chen deleted the URW-178-cache branch November 17, 2024 01:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants