Getting Started
This guide introduces a number of key concepts in GOV.UK Content API through the usage of examples. It utilises curl for interfacing with the API on the command line and is chosen due to the wide availability of curl, however you may prefer the structured output of using HTTPie or piping the curl responses through jq.
Accessing Content
GOV.UK Content API is used to access content that is hosted on www.gov.uk (referred to as GOV.UK). For a given HTML page, for example VAT Rates, we can look this up through this API:
curl https://www.gov.uk/api/content/vat-rates
This will return a ContentItem
object. Within this object are
fields that describe the content itself, metadata and associations
with other content. This section will explain conceptually some of the key
fields.
Base path
{
...
"base_path": "/vat-rates",
...
}
Every ContentItem
has a base_path
field which is the same
as the path parameter used to identify the
content.
The “base” aspect of this field is used as it indicates the root path of this piece of content as some pieces of content span multiple pages.
Document type and schema name
{
...
"document_type": "answer",
"schema_name": "answer",
...
}
The document_type
and schema_name
fields are used to define the format of
the data represented in a ContentItem
. The main fields that
are affected by this are details
and links
.
The schema_name
refers to a file referenced in
GOV.UK Content Schemas which utilises
JSON Schema to describe the rules of the data. The document_type
defines
the name of the format on GOV.UK. Often these fields share the same value as
there is often only one schema for a particular document_type
.
This is the case with VAT Rates where the content returned
utilises the answer
schema.
Details
{
...
"details": {
"body": "\n<div class=\"highlight-answer\">\n<p>The standard <abbr title=\"Value Added Tax\">VAT</abbr> rate is <em>20%</em></p>\n</div>\n\n<h2 id=\"vat-rates-for-goods-and-services\">\n<abbr title=\"Value Added Tax\">VAT</abbr> rates for goods and services</h2>\n\n<table>\n <thead>\n <tr>\n <th>Rate</th>\n <th>% of <abbr title=\"Value Added Tax\">VAT</abbr>\n</th>\n <th>What the rate applies to</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <td>Standard</td>\n <td>20%</td>\n <td>Most goods and services</td>\n </tr>\n <tr>\n <td>Reduced rate</td>\n <td>5%</td>\n <td>Some goods and services, eg children’s car seats and home energy</td>\n </tr>\n <tr>\n <td>Zero rate</td>\n <td>0%</td>\n <td>Zero-rated goods and services, eg most food and children’s clothes</td>\n </tr>\n </tbody>\n</table>\n\n<p>The standard rate of <abbr title=\"Value Added Tax\">VAT</abbr> increased to 20% on 4 January 2011 (from 17.5%).</p>\n\n<p>Some things are exempt from <abbr title=\"Value Added Tax\">VAT</abbr>, eg postage stamps, financial and property transactions.</p>\n\n<p>The <a href=\"/vat-businesses/vat-rates\"><abbr title=\"Value Added Tax\">VAT</abbr> rate businesses charge</a> depends on their goods and services.</p>\n\n<p>Check the <a href=\"https://www.gov.uk/rates-of-vat-on-different-goods-and-services\">rates of <abbr title=\"Value Added Tax\">VAT</abbr></a> on different goods and services.</p>\n\n",
"external_related_links": []
},
...
}
The details
field of a ContentItem
is an object that
contains data in a particular structure relevant for the type of content. This
structure is described by the schema, which
itself is defined by the schema_name
field.
In the case of VAT Rates an answer
schema
is used. The answer
schema allows a body
value, which contains the HTML
for the page and a external_related_links
value, which is an array that can
be used to provide links to associated pages hosted off GOV.UK.
Links
The links
field is used to describe relationships between content. It is
presented as an object where the key represents a link type and the
value represents an array of LinkedContentItems
. A
LinkedContentItem
is effectively a condensed ContentItem
.
The values that can be used for link types are defined within the
schema associated with the content.
This value describes the type of relationship between content. For example,
with VAT Rates there is a relationship with a link type of
mainstream_browse_pages
. The purpose of this link type is to enable linking
to pages to help users find similar content that is suitable for a mainstream
audience.
{
...
"links": {
...
"mainstream_browse_pages": [
{
"api_path": "/api/content/browse/tax/vat",
"api_url": "https://www.gov.uk/api/content/browse/tax/vat",
"base_path": "/browse/tax/vat",
"content_id": "895d337a-fa68-4c83-ab79-1c08016afe87",
"description": "Includes online returns, rates, charging and record keeping",
"document_type": "mainstream_browse_page",
"links": {},
"locale": "en",
"public_updated_at": "2015-06-24T13:56:39Z",
"schema_name": "mainstream_browse_page",
"title": "VAT",
"web_url": "https://www.gov.uk/browse/tax/vat",
"withdrawn": false
}
],
...
},
...
}
You may also notice in the above example that there is a links
field inside
the LinkedContentItem
object. This is available as it can be useful to have
a tree of linked content. A common usage of this is to show the hierarchy of
pages through use of the parent
link type.
Making use of content
The example below illustrates utilising Ruby on Rails with Rest Client to make use of GOV.UK in your application.
require "rest-client"
tax_help = Rails.cache.fetch("govuk/tax-help", expires_in: 1.hour) do
response = RestClient.get("https://www.gov.uk/api/content/tax-help", { content_type: "json" })
JSON.parse(response.body).dig("details", "body")
end
content = "<h1>GOV.UK Tax Help</h1><div>#{tax_help}</div>"
In this example we utilise the Rails cache layer so that we can infrequently access the content without concerns of hitting the rate limit.
We then use the API to access the content for
Tax Help. In the response we access the body
field from within the details
object. We store this to a variable tax_help
.
Finally we embed this in our own HTML and are ready to output to users.
Content that spans multiple pages
Sometimes a piece of content served by this API is used to render multiple pages on GOV.UK. This is a situation that occurs when the content may be presented in different scenarios.
An example of this can be found with travel advice. The local laws and customs for Thailand can be accessed as a single page and is also available as part of a complete printable guide. Within the API this is powered by a single piece of content available at /api/content/foreign-travel-advice/thailand.
You can however access this by using the path of the page you wish to access,
/foreign-travel-advice/thailand/local-laws-and-customs
. In this case you will
receive a 303 redirect response to the canonical source of the content.
$ curl -I https://www.gov.uk/api/content/foreign-travel-advice/thailand/local-laws-and-customs
HTTP/1.1 303 See Other
...
Location: https://www.gov.uk/api/content/foreign-travel-advice/thailand
...
Withdrawn content
Some content provided on GOV.UK provides information that is no longer applicable but has been kept for posterity. This content is described as being “withdrawn”.
When browsing GOV.UK you may see withdrawn content which is introduced by a header describing its status. An example of this is The complete routine immunisation schedule 2013 to 2014 which was withdrawn as the content had become out of date.
Withdrawn status is represented in GOV.UK Content API through the
withdrawn_notice
field. We can look up the value for this using the above
example.
curl https://www.gov.uk/api/content/government/publications/the-complete-routine-immunisation-schedule-201314
Within the returned ContentItem
object there is a
WithdrawnNotice
object that describes the withdrawn
status.
"withdrawn_notice": {
"explanation": "<div class=\"govspeak\"><p>This document is out of date. See the current <a href=\"https://www.gov.uk/government/publications/the-complete-routine-immunisation-schedule\">complete routine immunisation schedule</a>.</p>\n</div>",
"withdrawn_at": "2015-08-12T13:47:11Z"
}