GNN 스터디 | Network Traffic Modeling and Prediction Using Graph Gaussian Processes

    https://ieeexplore.ieee.org/abstract/document/9992240/

     

    Network Traffic Modeling and Prediction Using Graph Gaussian Processes

    Traffic modeling and prediction is a vital task for designing efficient resource allocation strategies in telecommunication networks. This is challenging because network traffic data exhibits complex nonlinear spatiotemporal interactions. Moreover, the dat

    ieeexplore.ieee.org

     

    GNN 스터디 1주차 논문으로 이 논문을 읽게 되었다.

    이 논문에서 GNN과 GP를 어떻게 그리고 왜 결합해서 사용했는지에 대해 중점적으로 이야기 해보려고 한다.


    1. problem definition



    Since traffic data follows a graph-structured topology, this paper addresses the traffic prediction problem modeled as a graph. The graph $\mathcal{G}$ consists of nodes $\mathcal{V}$ and edges $\mathcal{E}$, with traffic data $y_t$ at time $t$. The goal of this problem is to predict the traffic at a future time based on observed traffic data. However, the challenge arises from the presence of both observed and missing nodes


    2. GP for network traffic data. 


    This paper presents traffic data as a Bayesian model based on Gaussian Processes (GP). The model includes both observed and missing nodes, and assumes that the traffic data is generated according to a Gaussian likelihood. The mathematical formulation is given as follows:

    $$ y_t = Wf_t + b + n_t $$

    - $y_t$ : the traffic data at time $t$ for each node.
    - $W$ : factor loading matrix, which captures the specific characteristics of each node
    - $f_t$ : latent variable vector that captures common temporal patterns among the nodes
    - $b$ : bias vector that represents the overall traffic volume at each node
    - $n_t$ :  additive white noise at time $t$

     

    To derive the posterior for inference, let’s define the prior distributions for each term:


    1. Prior for $W$


    The factor loading matrix $W$  is defined in the form of a Gaussian Random Field (GRF), which reflects correlations between nodes. GRF captures node relationships by penalizing large differences between neighboring nodes.

    This prior can be represented as:

    $$ p(w_{:,d}) \propto \exp\left(-\frac{\theta_{d1}}{2} \sum_{{i,j} \in E} (w_{id} - w_{jd})^2 - \frac{\theta_{d2}}{2} \sum_i w_{id}^2\right) $$

    Here, the term  $(w_{id} - w_{jd})^2$  in the prior encourages similarity between connected nodes.

    Using the Graph Laplacian matrix  $L$ , this expression can be simplified. The Laplacian  $L$  is defined in terms of the adjacency matrix  $A$ and the diagonal matrix  $D$ :

    $$ L = D - A $$

    Thus, we can rewrite the prior for $W$ as:

    $$ p(w_{:,d}) \propto \exp\left(-\frac{\theta_{d1}}{2} w_{:,d}^T L w_{:,d} - \frac{\theta_{d2}}{2} w_{:,d}^T w_{:,d}\right) $$


    2. Prior for  $f_t$



    The latent variable vector $f_t$ is divided into four components to capture distinct temporal patterns:

    $$ f_{dt} = f_d(t) = f_{1d}(t) + f_{2d}(t) + f_{3d}(t) + f_{4d}(t) $$

    Each component has a specific role:

    - $f_{1d}(t)$ : Short-term trend. This term captures rapidly changing patterns and is modeled with an RBF kernel:

    $$ f_{1d}(t) \sim \text{GP}(0, k_{1d}(t, t’)) \quad k_{1d}(t, t’) = \alpha_{1d} \exp(-\beta_{1d} |t - t’|^2) $$

    - $f_{2d}(t)$ : Daily quasi-periodic pattern. This term reflects daily patterns, using a combination of periodic RBF and standard RBF kernels:

    $$ f_{2d}(t) \sim \text{GP}(0, k_{2d}(t, t’)) \quad k_{2d}(t, t’) = \alpha_{2d} \exp(-\beta_{2d} |t - t’|^2) \exp\left(-\gamma_{1d} \sin^2\left(\frac{\pi}{\lambda_1} (t - t’)\right)\right) $$

    - $f_{3d}(t)$ : Weekly periodic pattern. This component captures weekly trends, similar to the daily periodic pattern but with a different period.

    - $f_{4d}(t)$ : Unstructured pattern. Modeled with a Dirac delta function  $\delta$ , this component reflects irregular and unexpected traffic variations:

    $$ f_{4d}(t) \sim \text{GP}(0, k_{4d}(t, t’)) \quad k_{4d}(t, t’) = \alpha_{nd}\delta(t - t’) $$

     


    3. Prior for $b$



    The prior for bias term $b$  encourages similar biases among neighboring nodes, acting as a regularizer. This regularization helps the network nodes to interact and capture an average level of traffic across the network:

    $$ p(b) = N(Za, Q_{D+1}^{-1}) $$

     

    4. Covariance Structure


    The covariance structure effectively reflects both spatial and temporal interactions among nodes, enhancing prediction accuracy. Nodes with similar traffic patterns will have similar covariance values.

    The covariance structure is defined as follows:

    $$ \text{E}[y_m(t)y_{m’}(t’)] = \sum_{d=1}^D \text{E}[w_{md}, w_{m’d}]\text{E}[f_d(t), f_d(t’)] + \sigma $$

     

    3. Model Inference and Prediction


    1. Inference Process


    The inference step updates the model’s parameters based on the observed traffic data. Since there is no data for the missing nodes, we use a marginal model that integrates out the parameters of the missing nodes and considers only the observed nodes.

    • Constructing the Marginal Model for Observed Nodes

    First, we calculate the marginal distribution for the parameters $W_o$ and the bias $b_o$ of the observed nodes. The parameters for observed and missing nodes are defined separately as $W_o$ and $W_{\bar{o}}$, respectively.

    The marginal distribution is computed as follows:

    $$ p(W_o) = \int p(W_o, W_{\bar{o}}) , dW_{\bar{o}}, \quad p(b_o) = \int p(b_o, b_{\bar{o}}) , db_{\bar{o}} $$

    • Objective Equation

    To optimize the model, we maximize the marginal log-likelihood with respect to the hyperparameters $\psi$:

    $$ \psi_{\text{opt}} = \arg\max_{\psi} \log Z(\psi) $$

    where $Z(\psi)$ is the marginal likelihood dependent only on the hyperparameters. Through this optimization, the model finds the optimal hyperparameters.

     

    2. Variational Inference



    Since calculating the marginal likelihood directly is challenging, we approximate it using Variational Bayes (VB).

    Posterior Approximation

    To simplify the posterior distribution, we assume the distributions of each variable are independent:

    $$ q(h) = q(b_o)q(a) \prod_d q(f_{:,d})q(w_{o,:,d}) $$

    This factorization simplifies the computation significantly. The optimal variational distribution $q(h_j)$ for each variable is given as:

    $$ q(h_j) \propto \exp \left( E_{\sim q(h_j)} \left[ \log(p(h, Y)) \right] \right) $$

    This expression calculates the optimal distribution for each variable by taking the expectation over all variables except $h_j$ and then calculating the log-probability.


    3. Hyperparameter Optimization


    The hyperparameters are optimized by minimizing the KL divergence, which reduces the difference between the approximate distribution and the true posterior. This is defined as:

    $$ \psi^{(i)} = \arg\max_{\psi} -\text{KL}(q^{(i)}(h) \parallel p(h \mid Y; \psi)) $$

    Through this process, we find the optimal values for the hyperparameters.


    4. Prediction


    Once the model is optimized, we can predict future traffic. There are two types of predictions:

    • Prediction for Observed Nodes

    Future traffic for observed nodes is predicted using the posterior predictive distribution. This prediction calculates the expected value by integrating over several latent variables.

     

    • Prediction for Missing Nodes

    The prediction for missing nodes is inferred based on the interactions with observed nodes, using a conditional distribution approach to estimate the distribution for missing nodes.

    728x90

    'AI > graph neural network' 카테고리의 다른 글

    패캠 | Graph Embedding  (0) 2024.11.17
    패캠 | Node Embedding  (0) 2024.11.17

    댓글