Using the BLSloadR File Cache
File-cache-guide.RmdOverview of BLS File Structure
BLSloadR streamlines access to data from the U.S. Bureau
of Labor Statistics (BLS). The primary source of this data is the BLS
“flat files” which are published at https://download.bls.gov/pub/time.series/. This data is
published as text files in the form of a relational database, with data
organized into three main categories of file:
- Data files, which contain the values associated with discrete data series.
- Series files, which link data files to a series of data lookup codes via a series code.
- Lookup files, which map the series lookup codes to descriptive values.
Some databases will use multiple data files that act as slices of the larger data set. Others use an “aspect” file to add additional dimensions to the data.
File Caching in BLSloadR
This article discusses the implementation of a file cache for
BLSloadR, which will optionally download files from the BLS
to local storage to accomplish two goals:
- Maintain a local copy of the data in case of network disruption.
- Reduce network overhead when regularly accessing BLS data, more frequently than the data is updated by BLS.
To preserve existing functionality, file caching is not enabled by
default in BLSloadR. There are two ways to use file caching
- as a one-off argument in a supported function, or by setting the
environment variable in your Renviron file. When file caching is
enabled, BLSloadR will take the following steps:
- View information about the remote files (size and last-modified date)
- Look for the corresponding file in your local cache
If the local file does not exist, or is older or a different size than the BLS file, then a new file is downloaded to the cache and then read. Otherwise, the local file will be read. Functions currently supporting a file cache are:
Using File Cache Case-by-Case
Supported functions in BLSloadR include the
cache argument, which defaults to FALSE. Using the file
cache on a case-by-case basis can be done by setting
cache=TRUE in the function. Note that subsequent usage
without setting the cache variable will ignore any cached files - in
order to evaluate whether there are any cached files, the argument must
be set to TRUE.
Enable Default File Caching
In order to enable file caching by default, you can set an
environment variable in your Renviron file, and add the following:
USE_BLS_CACHE="TRUE"
When file caching is enabled by default, you can still disable use of
the cache on a temporary basis by using the cache argument within the
supported functions: cache=FALSE
Controlling Cache Location
The BLS file cache can be controlled with the
BLS_CACHE_DIR environment variable. If this variable is not
set but the cache is used, BLSloadR will use the folder
location given by
tools::R_user_dir("BLSloadR", which = "cache"). You can
check the cache directory with the helper function
bls_get_cache_dir().