CS149 Programming Assignment 5 - Big Graph Processing

Stanford CS149 (Fall 2023) Parallel Computing

Optional Programming Assignment 5: Big Graph Processing in OpenMP

Part 1: Parallel “Top Down” Breadth-First Search (20 points)

使用 OpenMP 加速从起点开始的 BFS

看了一下给出的代码，就是维护两层队列，一层是当前层，一层是下一层，那么能够加速的就是每一次向下一层扩展的过程。

这里选择对第一层 for 循环进行并行化，由于每个点的边数不一样，所以选择动态的调度方式，并且每个线程都有自己的 buffer，最后再合并。

提示中说到了部分情况可以避免 CAS 操作，也就是提前判断一下

void top_down_step(Graph g, vertex_set *frontier, vertex_set *new_frontier,
                   int *distances) {
  const int new_dist = distances[frontier->vertices[0]] + 1;
#pragma omp parallel
  {
    Vertex *buffer = new Vertex[g->num_nodes];
    int buffer_size = 0;

#pragma omp for schedule(dynamic, 200)
    for (int i = 0; i < frontier->count; i++) {
      const int node = frontier->vertices[i];
      for (const Vertex *x = outgoing_begin(g, node),
                        *limit = outgoing_end(g, node);
           x < limit; x++) {
        if (distances[*x] == NOT_VISITED_MARKER &&
            __sync_bool_compare_and_swap(distances + *x, NOT_VISITED_MARKER,
                                         new_dist)) {
          buffer[buffer_size++] = *x;
        }
      }
    }

    int index = __sync_fetch_and_add(&new_frontier->count, buffer_size);
    memcpy(new_frontier->vertices + index, buffer,
           buffer_size * sizeof(Vertex));

    delete[] buffer;
  }
}

Part 2: “Bottom Up” BFS (25 points)

反过来的 BFS，每次枚举没有到达的点判断是否可以到达

一开始试了一下维护没有到达的点的数组，但发现图似乎是不完全联通的，最后会剩下大量的点反复拷贝，反而效果不是很好，于是就保持和 Part1 一样

最终效果也不是很完美，只有 reference 的 75% 左右

void bottom_up_step(Graph g, vertex_set *frontier, vertex_set *new_frontier,
                    int *distances) {
  const int new_dist = distances[frontier->vertices[0]] + 1;

#pragma omp parallel
  {
    Vertex *buffer = new Vertex[g->num_nodes];
    int buffer_size = 0;

#pragma omp for schedule(dynamic, 2000)
    for (int i = 0; i < g->num_nodes; ++i) {
      if (distances[i] != NOT_VISITED_MARKER)
        continue;
      for (const Vertex *x = incoming_begin(g, i), *limit = incoming_end(g, i);
           x < limit; ++x) {
        if (distances[*x] == new_dist - 1) {
          distances[i] = new_dist;
          buffer[buffer_size++] = i;
          break;
        }
      }
    }

    int index = __sync_fetch_and_add(&new_frontier->count, buffer_size);
    memcpy(new_frontier->vertices + index, buffer,
           buffer_size * sizeof(Vertex));

    delete[] buffer;
  }
}

void bfs_bottom_up(Graph graph, solution *sol) {
  vertex_set list1;
  vertex_set list2;
  vertex_set_init(&list1, graph->num_nodes);
  vertex_set_init(&list2, graph->num_nodes);

  vertex_set *frontier = &list1;
  vertex_set *new_frontier = &list2;

  for (int i = 0; i < graph->num_nodes; i++)
    sol->distances[i] = NOT_VISITED_MARKER;

  frontier->vertices[frontier->count++] = ROOT_NODE_ID;
  sol->distances[ROOT_NODE_ID] = 0;

  while (frontier->count != 0) {
    vertex_set_clear(new_frontier);

    bottom_up_step(graph, frontier, new_frontier, sol->distances);

    vertex_set *tmp = frontier;
    frontier = new_frontier;
    new_frontier = tmp;
  }
}

Part 3: Hybrid BFS (25 points)

混合 Part 1 和 Part 2 的 BFS

这里简单设了一个 frontier->count < graph->num_nodes * 0.05 的条件，来判断使用哪种 BFS

void bfs_hybrid(Graph graph, solution *sol) {
  vertex_set list1;
  vertex_set list2;
  vertex_set_init(&list1, graph->num_nodes);
  vertex_set_init(&list2, graph->num_nodes);

  vertex_set *frontier = &list1;
  vertex_set *new_frontier = &list2;

  for (int i = 0; i < graph->num_nodes; i++)
    sol->distances[i] = NOT_VISITED_MARKER;

  frontier->vertices[frontier->count++] = ROOT_NODE_ID;
  sol->distances[ROOT_NODE_ID] = 0;

  // int remaining_nodes = graph->num_nodes - 1;

  while (frontier->count != 0) {
    vertex_set_clear(new_frontier);

    if (frontier->count < graph->num_nodes * 0.05) {
      top_down_step(graph, frontier, new_frontier, sol->distances);
    } else {
      bottom_up_step(graph, frontier, new_frontier, sol->distances);
    }

    // remaining_nodes -= frontier->count;

    vertex_set *tmp = frontier;
    frontier = new_frontier;
    new_frontier = tmp;
  }
}

Your Code: Timing Summary
Threads  Top Down          Bottom Up         Hybrid
   1:    4.81 (1.00x)      7.03 (1.00x)      2.29 (1.00x)
   2:    2.58 (1.86x)      3.66 (1.92x)      1.24 (1.84x)
   4:    1.40 (3.43x)      2.12 (3.31x)      0.68 (3.37x)
   8:    0.98 (4.91x)      1.37 (5.14x)      0.47 (4.91x)
  16:    0.85 (5.64x)      1.22 (5.75x)      0.52 (4.43x)
----------------------------------------------------------
Reference: Timing Summary
Threads  Top Down          Bottom Up         Hybrid
   1:    5.21 (1.00x)      5.77 (1.00x)      3.06 (1.00x)
   2:    2.81 (1.86x)      2.99 (1.93x)      1.63 (1.88x)
   4:    1.54 (3.38x)      1.76 (3.28x)      0.91 (3.35x)
   8:    1.05 (4.96x)      1.14 (5.05x)      0.63 (4.86x)
  16:    1.00 (5.19x)      0.84 (6.89x)      0.64 (4.82x)
----------------------------------------------------------
Correctness: 

Speedup vs. Reference: 
Threads       Top Down          Bottom Up             Hybrid
   1:             1.08               0.82               1.34
   2:             1.09               0.82               1.31
   4:             1.10               0.83               1.35
   8:             1.07               0.84               1.36
  16:             1.18               0.68               1.23

最后的得分（记得把内存开大，我 wsl 的 8G 内存根本不够用，一直被杀进程）

--------------------------------------------------------------------------
SCORES :                    |   Top-Down    |   Bott-Up    |    Hybrid    |
--------------------------------------------------------------------------
grid1000x1000.graph         |      2.00 / 2 |     3.00 / 3 |     3.00 / 3 |
--------------------------------------------------------------------------
soc-livejournal1_68m.graph  |      1.95 / 2 |     3.00 / 3 |     3.00 / 3 |
--------------------------------------------------------------------------
com-orkut_117m.graph        |      2.00 / 2 |     3.00 / 3 |     3.00 / 3 |
--------------------------------------------------------------------------
random_500m.graph           |      7.00 / 7 |     8.00 / 8 |     8.00 / 8 |
--------------------------------------------------------------------------
rmat_200m.graph             |      7.00 / 7 |     4.87 / 8 |     8.00 / 8 |
--------------------------------------------------------------------------
TOTAL                                                      |  66.82 / 70 |
--------------------------------------------------------------------------