Extending the Layer Class
Models in Tensorflow are made of several layers connected forming a graph. All implementations as well as high level APIs such as Keras and Sonnet build up on the base class of a module. Modules are essentially class helpers to implementing layers as well as models. A layer in Tensorflow follows a few simple syntactic rules. At its minimum definition it needs to implement an __init__ , a build and a call function. The build function is used to build the structure of the layer and the call function is where the forward pass functionality is defined. Here’s the implementation of the neural layer presented in part I of this series extending the tf.keras.layers.Layer.
class MyDense(tf.keras.layers.Layer):
def __init__(self, output_features, **kwargs):
super().__init__(**kwargs)
self.output_features = output_features
def build(self, input_shape):
self.w = tf.Variable(tf.random.normal((input_shape[-1], self.output_features)), name = 'weights')
self.b = tf.Variable(tf.zeros(self.output_features, dtype = tf.float32), name='biases')
def call(self, inputs):
return tf.nn.sigmoid(tf.matmul(inputs, self.w) + self.b)
Instantiating the layer is as simple as:
my_dense = MyDense(output_features = 5)
However, this is a lazy implementation of the layer, and the model will not be built, and variables will not be defined until we call:
my_dense(tf.constant([[1., 1., 1.], [2., 2., 2.]]))
<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[0.65940213, 0.33986932, 0.6029423 ],
[0.7893917 , 0.20953123, 0.6975124 ]], dtype=float32)>
Extending the Model Class
Once we have a functional proprietary layer we can move forward and implement a custom model structure that uses it. The Model class is closely related to the Layer class however it builds upon it to allow compatibility with the extended toolset Tensorflow provides to train, evaluate, load, and save structures. Here’s the code to instantiate a two layered custom model:
class MyModel(tf.keras.Model):
def __init__(self, **kwargs):
super().__init__(**kwargs)
self.layer_1 = MyDense(output_features = 3)
self.layer_2 = MyDense(output_features = 1)
def call(self, input):
return self.layer_2(self.layer_1(input))
Following is the code to instantiate and use the model on a forward pass:
my_custom_model = MyModel()
my_custom_model(tf.constant([[1., 1., 1.], [2., 2., 2.]]))
<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[0.73259264, 0.51657116, 0.31847566],
[0.7387214 , 0.51708895, 0.31333414]], dtype=float32)>
We can always use summary() to get some basic structural information about our models:
my_custom_model.summary()
Model: "my_model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
my_dense_3 (MyDense) multiple 6
_________________________________________________________________
my_dense_4 (MyDense) multiple 4
=================================================================
Total params: 10
Trainable params: 10
Non-trainable params: 0
Custom gradients
As was the case with the hybrid classical – quantum neural network we are examining in our Quantum Computing article there are cases when we need to dive a bit deeper in the customization process. One such case is the need for a custom gradient definition. The tf.custom.gradient decorator allows for fine grained control of the gradient calculation process. Here’s a simple example of how to define y=x2 and its gradient:
@tf.custom_gradient
def f(x):
def grad(upstream):
return upstream * 2 * x
return x**2, grad
The variable upstream is the gradient result from the layers originating from the current layer.
Binding it all together into a Quantum Neural Layer
Let’s assume that information flows through a classical Neural Network and as data flows through the network we would like to be able to inject a Quantum Layer composed of a circuit of qubits, translate the classical data to quantum data, send the circuit to an actual Quantum Computer, receive the results, translate them back to classical data and continue our calculations within the classical space of the following neural layers.
First, how do we translate a classical value into qubit data? If we normalize the data flowing through the neural network from 0 to 1 then we can assume that this is a probability value, we can then use the arccos function to turn that probability to an angle, the angle theta of the qubit.
def p_to_angle(self, p):
try:
angle = 2 * np.arccos(np.sqrt(p))
except Exception as e:
raise QiskitCircuitModuleException(
QiskitCircuitModuleExceptionData(str({f"""'timestamp': '{datetime.datetime.now().
strftime("%m/%d/%Y, %H:%M:%S")}',
'function': 'p_to_angle',
'message': '{e.message}'"""})))
return angle
Next, we need to implement a layer that will execute the forward calculation of the quantum circuit, each qubit in this structure is the equivalent of a neuron. The layer can send the calculation either to a local simulator like AER or to IBMQ and get in queue for actual Quantum execution:
class QuantumLayer(Layer):
def __init__(self, qubits=6, instructions=None, execute_on_IBMQ=False, shots=10):
super(QuantumLayer, self).__init__()
self.use_parameter_shift_gradient_flow = use_parameter_shift_gradient_flow
self.qubits = qubits
self.instructions = instructions
self.tensor_history = []
self.execute_on_IBMQ = execute_on_IBMQ
self.shots = shots
self.circuit = QiskitCircuitModule(self.qubits,
instructions=self.instructions,
execute_on_IBMQ=self.execute_on_IBMQ,
shots=self.shots)
def build(self, input_shape):
kernel_p_initialisation = tf.random_normal_initializer()
self.kernel_p = tf.Variable(name="kernel_p",
initial_value=kernel_p_initialisation(shape=(input_shape[-1],
self.qubits),
dtype='float32'),
trainable=True)
kernel_phi_initialisation = tf.zeros_initializer()
self.kernel_phi = tf.Variable(name="kernel_phi",
initial_value=kernel_phi_initialisation(shape=(self.qubits,),
dtype='float32'),
trainable=False)
def call(self, inputs):
try:
output = self.quantum_flow(inputs)
except QiskitCircuitModuleException as qex:
raise qex
return output
You can see in the implementation that __init__ will hold variables outside tensorflow needed to calibrate execution. The build function just initializes the variable structures. Notice how kernel_phi is set to a non-trainable variable. This is because the angle phi can be used for more complex types of applications and we wanted to keep the training simple to start with, however training of phi variables on qubits is fully supported. Thereafter, the call() function has two execution modes and really the proper flow is by use of the custom gradients, so setting parameter use_parameter_shift_gradient_flow to True. However, this will increase execution time significantly as for every epoch the circuit will be sent to IBM Quantum three times instead of one.
The problem here lies with the partial derivatives calculations. Is it possible for Tensorflow and the GradientTape to calculate the derivative? The tensor that flows within the neural network structure, stops, converts into quantum information, exits the classical computing world, continues flowing in a Quantum Computer, returns and converts back into a classical tensor and continues flowing withing tensorflow. Clearly this derivative cannot be calculated using the default tensorflow gradientTape and we need to override and provide the calculations ourselves. The derivative calculation relies on a simple mathematical concept, in non-continuous functions at a point x, we can calculate an approximation of the derivative by calculating the value of the function as it approaches x from its right and left side. This is what this custom derivative does, it introduces a jitter value in the quantum circuit and executes it twice once for adding it and once for subtracting it:
@tf.custom_gradient
def quantum_flow(self, x):
output = tf.matmul(x, self.kernel_p)
qubit_output = tf.reshape(tf.convert_to_tensor(self.circuit.quantum_execute(tf.reshape(output,
[1, self.qubits]),
self.kernel_phi)),
(1, 1, self.qubits))
output = qubit_output
def grad(dy, variables=None):
shift = np.pi / 2
shift_right = x + np.ones(x.shape) * shift
shift_left = x - np.ones(x.shape) * shift
input_left = tf.matmul(shift_left, self.kernel_p)
input_right = tf.matmul(shift_right, self.kernel_p)
output_right = self.circuit.quantum_execute(tf.reshape(input_right, [1, self.qubits]), self.kernel_phi)
output_left = self.circuit.quantum_execute(tf.reshape(input_left, [1, self.qubits]), self.kernel_phi)
quantum_gradient = [output_right[i] - output_left[i] for i in range(len(output_right))]
input_gradient = dy * quantum_gradient
dy_input_gradient = tf.reshape(tf.matmul(input_gradient, tf.transpose(self.kernel_p)),
shape=[1, 1, x.get_shape().as_list()[-1]])
grd_w = []
for i in range(self.qubits):
w = self.kernel_p[:, i]
w += dy_input_gradient
grd_w.append(w)
tf_grd_w = tf.convert_to_tensor(grd_w)
tf_grd_w = tf.reshape(tf_grd_w, shape=(x.get_shape().as_list()[-1], self.qubits))
return dy_input_gradient, [tf_grd_w]
return output, grad
Whilst this is a rare and complex case, we see that Tensorflow still provides us with the appropriate toolset to create a structure that will be fully compatible with the framework and utilize its full potential.