r"""Homophily measure recommended in `Characterizing Graph Datasets for Node Classification: Homophily-Heterophily Dichotomy and Beyond `__ Adjusted homophily is edge homophily adjusted for the expected number of edges connecting nodes with the same
(graph, y)
| 195 | |
| 196 | |
| 197 | def adjusted_homophily(graph, y): |
| 198 | r"""Homophily measure recommended in `Characterizing Graph Datasets for |
| 199 | Node Classification: Homophily-Heterophily Dichotomy and Beyond |
| 200 | <https://arxiv.org/abs/2209.06177>`__ |
| 201 | |
| 202 | Adjusted homophily is edge homophily adjusted for the expected number of |
| 203 | edges connecting nodes with the same class label (taking into account the |
| 204 | number of classes, their sizes, and the distribution of node degrees among |
| 205 | them). |
| 206 | |
| 207 | Mathematically it is defined as follows: |
| 208 | |
| 209 | .. math:: |
| 210 | \frac{h_{edge} - \sum_{k=1}^C \bar{p}(k)^2} |
| 211 | {1 - \sum_{k=1}^C \bar{p}(k)^2}, |
| 212 | |
| 213 | where :math:`h_{edge}` denotes edge homophily, :math:`C` denotes the |
| 214 | number of classes, and :math:`\bar{p}(\cdot)` is the empirical |
| 215 | degree-weighted distribution of classes: |
| 216 | :math:`\bar{p}(k) = \frac{\sum_{v\,:\,y_v = k} d(v)}{2|E|}`, |
| 217 | where :math:`d(v)` is the degree of node :math:`v`. |
| 218 | |
| 219 | It has been shown that adjusted homophily satisifes more desirable |
| 220 | properties than other homophily measures, which makes it appropriate for |
| 221 | comparing the levels of homophily across datasets with different number |
| 222 | of classes, different class sizes, andd different degree distributions |
| 223 | among classes. |
| 224 | |
| 225 | Adjusted homophily can be negative. If adjusted homophily is zero, then |
| 226 | the edge pattern in the graph is independent of node class labels. If it |
| 227 | is positive, then the nodes in the graph tend to connect to nodes of the |
| 228 | same class more often, and if it is negative, than the nodes in the graph |
| 229 | tend to connect to nodes of different classes more often (compared to the |
| 230 | null model where edges are independent of node class labels). |
| 231 | |
| 232 | Parameters |
| 233 | ---------- |
| 234 | graph : DGLGraph |
| 235 | The graph. |
| 236 | y : torch.Tensor |
| 237 | The node labels, which is a tensor of shape (|V|). |
| 238 | |
| 239 | Returns |
| 240 | ------- |
| 241 | float |
| 242 | The adjusted homophily value. |
| 243 | |
| 244 | Examples |
| 245 | -------- |
| 246 | >>> import dgl |
| 247 | >>> import torch |
| 248 | |
| 249 | >>> graph = dgl.graph(([1, 2, 0, 4], [0, 1, 2, 3])) |
| 250 | >>> y = torch.tensor([0, 0, 0, 0, 1]) |
| 251 | >>> dgl.adjusted_homophily(graph, y) |
| 252 | -0.1428571492433548 |
| 253 | """ |
| 254 | check_pytorch() |
nothing calls this directly
no test coverage detected