Purpose

This tutorial is the second in a series on Tick History data inside Google BigQuery.

The preceding tutorial introduced the underlying technology, and explained how to set-up an environment in the Google Cloud Platform (GCP) console. It also covered the data schemas, and how to examine the data. It also listed the prerequisites for you to be able to do all this.

This tutorial explains how to retrieve tick data, using SQL.

The following tutorial introduces actual analytics on the data.

For your convenience, a set of sample queries is available in a downloadable file.

Quotes retrieval using a simple SQL query

You can type the SQL commands directly in the query window. Note that the console editor validates the syntax and will flag errors.

Let us start by simply retrieving some quotes:

    	
            

#StandardSQL

SELECT

  Date_Time, RIC, Bid_Price, Bid_Size, Ask_Price, Ask_Size, Qualifiers

FROM

  `dbd-sdlc-prod.LSE_NORMALISED.LSE_NORMALISED`

WHERE

  RIC LIKE "VOD.L"

  AND (Date_Time BETWEEN TIMESTAMP('2019-09-04 00:00:00.000000') AND

                         TIMESTAMP('2019-09-04 23:59:59.999999'))

  AND Type = "Quote"

  AND ((Bid_Price IS NOT NULL AND Bid_Size IS NOT NULL) OR

       (Ask_Price IS NOT NULL AND Ask_Size IS NOT NULL))

ORDER BY

  RIC, Date_Time

LIMIT

  100000

The first instruction in our query is for the Google Platform, to specify which SQL dialect we are using. Both the standard and legacy dialects are supported by BigQuery, the default depends on the interface you use to access BigQuery (see Switching SQL dialects for more information).

The SELECT statement is for a few relevant fields, taken from the LSE normalised table view. It also limits the results to a single instrument and day.

Tick data contains various data record types: Quote, Trade, Correction, Auction, Mkt. Condition.

Here we specify we are interested in quotes, and further narrow down the results to those that contain non-null prices and volumes.

As you see, the query is self-explanatory.

After running it, we see the results:

Notes:

  • Results can easily be exported to various formats and locations using the SAVE RESULTS button.
  • Returned time stamps are in UTC.

Trades retrieval using a simple SQL query

We only need to make small changes to the preceding query to retrieve trades:

    	
            

#StandardSQL

SELECT

  Date_Time, RIC, Price, Volume, Qualifiers

FROM

  `dbd-sdlc-prod.LSE_NORMALISED.LSE_NORMALISED`

WHERE

  RIC LIKE "VOD.L"

  AND (Date_Time BETWEEN TIMESTAMP('2019-09-04 00:00:00.000000') AND

                         TIMESTAMP('2019-09-04 23:59:59.999999'))

  AND Type = "Trade"

  AND Volume IS NOT NULL

  AND Price IS NOT NULL

ORDER BY

  RIC, Date_Time

LIMIT

  100000

We changed the tick type to Trade, and selected fields that are relevant for trades (price and volume).

After running it, we see the results:

Filtering quotes or trades for use in your analytics

Depending on the use case, you might want to apply an analysis to all quotes or trades, or, on the contrary, filter some out, because they are irrelevant to your analysis and might influence the calculation you want to perform.

Ticks with null values are easy to detect and filter out, as we saw in the code above.

Other typical examples of ticks you might want to filter could be for instance Previous Day, Auction, Off Book and Closing trades. These can be filtered by using the Qualifiers field, but this can be tricky:

  • The filters to apply might depend on your use case.
  • Qualifiers vary depending on the data venue, each exchange has its own codes.
  • Exchanges might have changed their qualifiers through time, maybe even several times.
  • Qualifier codes are not necessarily easy to understand.

To analyse and understand the qualifiers for a particular venue, you have several resources:

  • The exchange delivered documentation.
  • If you have Eikon, you can check out the SPEED GUIDE pages to find information.
  • The data itself. You can query the ticks for an entire day, and check which qualifiers are used inside / outside trading hours. You also need to check if there are any qualifiers specific to auction periods. The qualifiers you will usually want to filter out are those used only outside trading hours, or only during auction periods. Make spot checks on data over several years, to see if the qualifiers content changed in time.

Finally, validate your findings and filters with a data specialist before using them, to ensure correct results.

Note: the filters used in this tutorial set and the downloadable sample set are examples to demonstrate how filters are implemented from a purely technical perspective. They were created based only on LSE and PAR, and were not validated by a data specialist. You need to create your own filters, adapted to your use cases.

Market depth retrieval using a simple SQL query

Let us now retrieve the top of the order book. For that we need to select the NORMALISEDLL2 data view. Here is an example to retrieve the top 3 levels, i.e. the 3 top best prices:

    	
            

#StandardSQL

SELECT

  Date_Time,

  RIC,

  L1_BidPrice, L1_BidSize, L1_AskPrice, L1_AskSize,

  L2_BidPrice, L2_BidSize, L2_AskPrice, L2_AskSize,

  L3_BidPrice, L3_BidSize, L3_AskPrice, L3_AskSize

FROM

  `dbd-sdlc-prod.LSE_NORMALISEDLL2.LSE_NORMALISEDLL2`

WHERE

  RIC LIKE "VOD.L"

  AND (Date_Time BETWEEN TIMESTAMP('2019-09-04 00:00:00.000000') AND

                         TIMESTAMP('2019-09-04 23:59:59.999999'))

  AND (

       ((L1_BidPrice IS NOT NULL AND L1_BidSize IS NOT NULL) OR

        (L1_AskPrice IS NOT NULL AND L1_AskSize IS NOT NULL))

    OR ((L2_BidPrice IS NOT NULL AND L2_BidSize IS NOT NULL) OR

        (L2_AskPrice IS NOT NULL AND L2_AskSize IS NOT NULL))

    OR ((L3_BidPrice IS NOT NULL AND L3_BidSize IS NOT NULL) OR

        (L3_AskPrice IS NOT NULL AND L3_AskSize IS NOT NULL))

  )

ORDER BY

  RIC, Date_Time

LIMIT

  100000

The available depth of the book depends on the venue. LSE delivers 10 levels, of which we only retrieve 3 in this code sample.

After running it, we see the results:

There are some null values here because the data in this display is the first data of the day, with the very first offers. That is why not all fields have data. When you scroll down further you see that the book fills up, there are no more null values.

Next step

We have seen how to retrieve and filter data using SQL queries. Please proceed to the next tutorial that covers data analytics using SQL.