This tutorial is the second in a series on Tick History data inside Google BigQuery.
The preceding tutorial introduced the underlying technology, and explained how to set-up an environment in the Google Cloud Platform (GCP) console. It also covered the data schemas, and how to examine the data. It also listed the prerequisites for you to be able to do all this.
This tutorial explains how to retrieve tick data, using SQL.
The following tutorial introduces actual analytics on the data.
For your convenience, a set of sample queries is available in a downloadable file.
You can type the SQL commands directly in the query window. Note that the console editor validates the syntax and will flag errors.
Let us start by simply retrieving some quotes:
#StandardSQL
SELECT
Date_Time, RIC, Bid_Price, Bid_Size, Ask_Price, Ask_Size, Qualifiers
FROM
`dbd-sdlc-prod.LSE_NORMALISED.LSE_NORMALISED`
WHERE
RIC LIKE "VOD.L"
AND (Date_Time BETWEEN TIMESTAMP('2019-09-04 00:00:00.000000') AND
TIMESTAMP('2019-09-04 23:59:59.999999'))
AND Type = "Quote"
AND ((Bid_Price IS NOT NULL AND Bid_Size IS NOT NULL) OR
(Ask_Price IS NOT NULL AND Ask_Size IS NOT NULL))
ORDER BY
RIC, Date_Time
LIMIT
100000
The first instruction in our query is for the Google Platform, to specify which SQL dialect we are using. Both the standard and legacy dialects are supported by BigQuery, the default depends on the interface you use to access BigQuery (see Switching SQL dialects for more information).
The SELECT statement is for a few relevant fields, taken from the LSE normalised table view. It also limits the results to a single instrument and day.
Tick data contains various data record types: Quote, Trade, Correction, Auction, Mkt. Condition.
Here we specify we are interested in quotes, and further narrow down the results to those that contain non-null prices and volumes.
As you see, the query is self-explanatory.
After running it, we see the results:
Notes:
We only need to make small changes to the preceding query to retrieve trades:
#StandardSQL
SELECT
Date_Time, RIC, Price, Volume, Qualifiers
FROM
`dbd-sdlc-prod.LSE_NORMALISED.LSE_NORMALISED`
WHERE
RIC LIKE "VOD.L"
AND (Date_Time BETWEEN TIMESTAMP('2019-09-04 00:00:00.000000') AND
TIMESTAMP('2019-09-04 23:59:59.999999'))
AND Type = "Trade"
AND Volume IS NOT NULL
AND Price IS NOT NULL
ORDER BY
RIC, Date_Time
LIMIT
100000
We changed the tick type to Trade, and selected fields that are relevant for trades (price and volume).
After running it, we see the results:
Depending on the use case, you might want to apply an analysis to all quotes or trades, or, on the contrary, filter some out, because they are irrelevant to your analysis and might influence the calculation you want to perform.
Ticks with null values are easy to detect and filter out, as we saw in the code above.
Other typical examples of ticks you might want to filter could be for instance Previous Day, Auction, Off Book and Closing trades. These can be filtered by using the Qualifiers field, but this can be tricky:
To analyse and understand the qualifiers for a particular venue, you have several resources:
Finally, validate your findings and filters with a data specialist before using them, to ensure correct results.
Note: the filters used in this tutorial set and the downloadable sample set are examples to demonstrate how filters are implemented from a purely technical perspective. They were created based only on LSE and PAR, and were not validated by a data specialist. You need to create your own filters, adapted to your use cases.
Let us now retrieve the top of the order book. For that we need to select the NORMALISEDLL2 data view. Here is an example to retrieve the top 3 levels, i.e. the 3 top best prices:
#StandardSQL
SELECT
Date_Time,
RIC,
L1_BidPrice, L1_BidSize, L1_AskPrice, L1_AskSize,
L2_BidPrice, L2_BidSize, L2_AskPrice, L2_AskSize,
L3_BidPrice, L3_BidSize, L3_AskPrice, L3_AskSize
FROM
`dbd-sdlc-prod.LSE_NORMALISEDLL2.LSE_NORMALISEDLL2`
WHERE
RIC LIKE "VOD.L"
AND (Date_Time BETWEEN TIMESTAMP('2019-09-04 00:00:00.000000') AND
TIMESTAMP('2019-09-04 23:59:59.999999'))
AND (
((L1_BidPrice IS NOT NULL AND L1_BidSize IS NOT NULL) OR
(L1_AskPrice IS NOT NULL AND L1_AskSize IS NOT NULL))
OR ((L2_BidPrice IS NOT NULL AND L2_BidSize IS NOT NULL) OR
(L2_AskPrice IS NOT NULL AND L2_AskSize IS NOT NULL))
OR ((L3_BidPrice IS NOT NULL AND L3_BidSize IS NOT NULL) OR
(L3_AskPrice IS NOT NULL AND L3_AskSize IS NOT NULL))
)
ORDER BY
RIC, Date_Time
LIMIT
100000
The available depth of the book depends on the venue. LSE delivers 10 levels, of which we only retrieve 3 in this code sample.
After running it, we see the results:
There are some null values here because the data in this display is the first data of the day, with the very first offers. That is why not all fields have data. When you scroll down further you see that the book fills up, there are no more null values.
We have seen how to retrieve and filter data using SQL queries. Please proceed to the next tutorial that covers data analytics using SQL.