Transformer使用文档

由qxiao创建，最终由qxiao更新于2022-09-15 13:09 被浏览 564 用户

简介

Transformer:Attention is all you need

The naive transformer implemented here for financial time series prediction follows the paper "Attention is all you need": Given the input (N, T, F)

An embedding layer that maps the input (N, T, F) to representation (N, T, F’);
A positional encoding layer that adds the positional sigmoid;
An encoder that consists of several encoding layers, each of which uses a self-attention layer as the computing module (function of query, key, and value).
A decoder that consists of an MLP (or a Linear layer) that maps the representation of the last time (N, 1, F') into output (N, 1).

模型定义

定义

初始化Transformer模型。

参数

input_dim - 输入特征的数量

output_dim - 输出特征的数量

max_seq - 训练序列的最大窗口数

enbed_dim - Transformer的d_model

nhead - 多头的数量，默认8

num_layers - Transformer中Encoder的层数，默认为4

dropout - 默认为0.1

示例代码

from bigmodels.models.transformer import Transformer

model = Transformer(input_dim=98,
                    output_dim=1,
                    max_seq=5,
                    embed_dim=128,
                    nhead=8,
                    num_layers=4,
                    dropout=0.1)

模型编译

定义

定义好模型之后我们需要通过编译（compile）来对学习过程进行配置，我们可以为模型的编译指定各类参数包括：优化器optimizer，损失函数loss，评估指标metrics。

参数

optimizer - 优化器

criterion - 损失函数

metrics - 评估函数

device: 是否调用GPU，默认调用CPU

示例代码

model.compile()

模型训练

定义

利用数据集（Dataset）训练模型。

参数

x_trian - 训练集的特征数据

y_train: 训练集的数据标签

val_data -验证数据集

epochs - 训练的次数，默认10个epoch

batch_size - 每一个批次训练的样本数

verbose - 是否输出训练过程的数据，0不输出，1输出

num_workers - 数据预处理的进程数，默认为0

stop_steps - 提前终止的epoch次数，默认3

callbacks: 回调函数

shuffle: 是否打乱数据集，默认打乱

示例代码

model.fit(x_train, 
          y_train, 
          val_data=(x_val, y_val), 
          epochs=10, 
          batch_size=2048, 
          verbose=1, 
          num_workers=2,
          stop_steps=3)