A Layer used in AutoInt that model the correlations between different feature fields by multi-head self-attention mechanism. Input shape - A 3D tensor with shape: ``(batch_size,field_size,embedding_size)``. Output shape - 3D tensor with shape:``(batch_size,field_s
| 326 | |
| 327 | |
| 328 | class InteractingLayer(nn.Module): |
| 329 | """A Layer used in AutoInt that model the correlations between different feature fields by multi-head self-attention mechanism. |
| 330 | Input shape |
| 331 | - A 3D tensor with shape: ``(batch_size,field_size,embedding_size)``. |
| 332 | Output shape |
| 333 | - 3D tensor with shape:``(batch_size,field_size,embedding_size)``. |
| 334 | Arguments |
| 335 | - **in_features** : Positive integer, dimensionality of input features. |
| 336 | - **head_num**: int.The head number in multi-head self-attention network. |
| 337 | - **use_res**: bool.Whether or not use standard residual connections before output. |
| 338 | - **seed**: A Python integer to use as random seed. |
| 339 | References |
| 340 | - [Song W, Shi C, Xiao Z, et al. AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks[J]. arXiv preprint arXiv:1810.11921, 2018.](https://arxiv.org/abs/1810.11921) |
| 341 | """ |
| 342 | |
| 343 | def __init__(self, embedding_size, head_num=2, use_res=True, scaling=False, seed=1024, device='cpu'): |
| 344 | super(InteractingLayer, self).__init__() |
| 345 | if head_num <= 0: |
| 346 | raise ValueError('head_num must be a int > 0') |
| 347 | if embedding_size % head_num != 0: |
| 348 | raise ValueError('embedding_size is not an integer multiple of head_num!') |
| 349 | self.att_embedding_size = embedding_size // head_num |
| 350 | self.head_num = head_num |
| 351 | self.use_res = use_res |
| 352 | self.scaling = scaling |
| 353 | self.seed = seed |
| 354 | |
| 355 | self.W_Query = nn.Parameter(torch.Tensor(embedding_size, embedding_size)) |
| 356 | self.W_key = nn.Parameter(torch.Tensor(embedding_size, embedding_size)) |
| 357 | self.W_Value = nn.Parameter(torch.Tensor(embedding_size, embedding_size)) |
| 358 | |
| 359 | if self.use_res: |
| 360 | self.W_Res = nn.Parameter(torch.Tensor(embedding_size, embedding_size)) |
| 361 | for tensor in self.parameters(): |
| 362 | nn.init.normal_(tensor, mean=0.0, std=0.05) |
| 363 | |
| 364 | self.to(device) |
| 365 | |
| 366 | def forward(self, inputs): |
| 367 | |
| 368 | if len(inputs.shape) != 3: |
| 369 | raise ValueError( |
| 370 | "Unexpected inputs dimensions %d, expect to be 3 dimensions" % (len(inputs.shape))) |
| 371 | |
| 372 | # None F D |
| 373 | querys = torch.tensordot(inputs, self.W_Query, dims=([-1], [0])) |
| 374 | keys = torch.tensordot(inputs, self.W_key, dims=([-1], [0])) |
| 375 | values = torch.tensordot(inputs, self.W_Value, dims=([-1], [0])) |
| 376 | |
| 377 | # head_num None F D/head_num |
| 378 | querys = torch.stack(torch.split(querys, self.att_embedding_size, dim=2)) |
| 379 | keys = torch.stack(torch.split(keys, self.att_embedding_size, dim=2)) |
| 380 | values = torch.stack(torch.split(values, self.att_embedding_size, dim=2)) |
| 381 | |
| 382 | inner_product = torch.einsum('bnik,bnjk->bnij', querys, keys) # head_num None F F |
| 383 | if self.scaling: |
| 384 | inner_product /= self.att_embedding_size ** 0.5 |
| 385 | self.normalized_att_scores = F.softmax(inner_product, dim=-1) # head_num None F F |