[資料結構] 使用 C 語言：串列走訪 (List Traversal)

前言

本文探討兩個和串列走訪相關的議題，一個是迭代器 (iterator)，一個是和串列走訪相關的高階函式 (higher-order function)。這兩個議題不是典型資料結構教材會提到的主題，如果覺得目前用不到先略過也無妨。

迭代器 (Iterator)

串列的迭代器的抽象資料結構

迭代器 (iterator) 是在不暴露資料結構內部的前提下走訪該資料結構的公開方法。對資料結構實作者而言，藉由實作迭代器可以隱藏實作細節；對資料結構使用者來說，不用知道資料結構的內部實作也可以走訪該結構；對雙方來說，是雙贏的局面。

迭代器的抽象資料結構有顯性和隱性兩種。以下是顯性的迭代器：

L is a list.
I is an iterator.

sub Start(L): I
sub Next(I): I
sub End(I): bool
sub Value(I): data

顯性的迭代器會回傳代表迭代器的變數 I。使用範例如下：

for I <- Start(L); not End(I); I <- Next(I) do
    data <- value(I)
    Do something on data
end for

隱性迭代器的抽象資料結構如下：

L is a list.

sub Start(L): bool
sub Next(L): bool
sub End(L): bool
sub Value(L): data

和顯性迭代器不同，這時候不會有額外的變數。使用範例如下：

if Start(L) then
    while not End(L) do
        data <- value(L)
        Do something on data
        Next(L)
    end while
end if

實作迭代器

要實作外部迭代器可參考以下虛擬碼：

I belongs to Node type.

sub Start(L):
    assert L != null

    I <- L->head
    return I

sub Next(I):
    if I == null then
        return null
    end if

    I <- I->next

    return I

sub End(I):
    check whether I == null

sub Value(I):
    assert I != null

    return I->data

在隱性迭代器中，迭代器 I 會變成串列 L 的內部屬性，實作方式則大同小異，此處不列出。以下是等效的 C 語言外部迭代器程式碼：

Node * list_start(const List *self)
{
    assert(self);

    // Init an iterator.
    Node *iter = self->head;

    return iter;
}

Node * list_next(Node *self)
{
    if (!self) {
        return NULL;
    }

    self = self->next;

    return self;
}

bool list_end(Node *self)
{
    return self == NULL;
}

和串列走訪相關的高階函式 (Higher-Order Function)

串列相關高階函式的抽象資料結構

L, K are lists.

// User-defined function objects.
cmp <- fn(a, b): bool
filter <- fn(a): bool
mapper <- fn(a): result 
reducer <- fn(a, b): result

// ADT w/o side effects.
sub Any(L, filter): bool
sub All(L, filter): bool
sub Find(L, filter): (result, bool)
sub Select(L, filter): K
sub Sort(L, cmp): K
sub Map(L, mapper): K
sub Reduce(L, reducer): result

// Alternative ADT w/ side effects.
sub SelectMut(L, filter): void
sub SortMut(L, cmp): void
sub MapMut(L, mapper): void

Any(L, filter)
All(L, filter)
Find(L, filter)
Select(L, filter)
SelectMut(L, filter)
Sort(L, cmp)
SortMut(L, cmp)
Map(L, mapper)
MapMut(L, mapper)
Reduce(L, reducer)

基本上，帶有高階函式風格的串列走訪函式都可寫成 traverse(list, action) 的形式，我們只要預先寫好 traverse 的部分，使用者可隨需求自行填入 action 來搭配 traverse 以操作 list。

在這類函式中，有兩種風格，一種是無副作用 (side effect) 的，一種是有副作用的。前者不會改變原有的串列，而會回傳新的串列，後者則會直接修改原有的串列。兩種風格各有優劣，讀者可自行選擇喜好的方式。

有些進階的程式設計教材會考慮多執行緒的情境，這種實作較有用，但會讓實作更複雜，本文暫不考慮這種情境。

`Any`：確認串列中至少有一個元素符合條件

Any 的虛擬碼如下：

sub Any(L, filter):
    assert L != null

    curr <- L->head
    while curr != null do
        if filter(curr->data) then
            return true
        end if

        curr <- curr->next
    end while

    return false

只要串列 L 中有符合 filter 的元素，就回傳 true 中止函式，若所有元素皆不符合則回傳 false。其 C 語言實作如下：

typedef bool (*filterFn) (int);

bool list_any(const List *self, filterFn filter)
{
    assert(self);

    Node *curr = self->head;

    if (!curr) {
        return false;
    }

    while (curr) {
        if (filter(curr->data)) {
            return true;
        }

        curr = curr->next;
    }

    return false;
}

`All`：確認串列中所有元素皆符合條件

All 的虛擬碼如下：

sub All(L, filter):
    assert L != null

    curr <- L->head
    while curr != null do
        if not filter(curr->data) then
            return false
        end if

        curr <- curr->next
    end while

    return true

只要串列 L 有不符合 filter 的元素，就回傳 false 中止函式，若所有元素皆符合則回傳 true。其 C 語言實作如下：

typedef bool (*filterFn) (int);

bool list_all(const List *self, filterFn filter)
{
    assert(self);

    Node *curr = self->head;

    if (!curr) {
        return false;
    }

    while (curr) {
        if (!filter(curr->data)) {
            return false;
        }

        curr = curr->next;
    }

    return true;
}

`Find`：從串列中根據特定條件找出元素

Find 的虛擬碼如下：

sub Find(L, filter):
    assert L != null

    curr <- L->head
    i <- 0
    while curr != null do
        if filter(curr->data) then
            return (i, true)
        end if

        curr <- curr->next
        i <- i + 1
    end while

    return (0, false)

在我們這個演算法中，若有符合條件 filter 的元素，會直接回傳 data 並中止函式，若所有元素皆不符合條件，則回傳空值。如果想要得到所有符合條件的元素，請參考下一節的 Select 函式。

以下是等效的 C 語言實作：

typedef bool (*filterFn) (int);

bool list_find(const List *self, filterFn filter, size_t *out)
{
    assert(!list_is_empty(self));

    Node *curr = self->head;
    size_t i = 0;
    while (curr) {
        if (filter(curr->data)) {
            *out = i;
            return true;
        }

        curr = curr->next;
        i++;
    }

    *out = 0;
    return false;
}

`Select`：從串列中取得所有符合條件的元素

以下是 Select 的虛擬碼：

sub Select(L, filter):
    assert L != null

    K <- a new empty list.
    curr <- L->head
    while curr != null do
        if filter(curr->data) then
            Push(K, curr->data)
        end

        curr <- curr->next
    end while

    return K

由於此函式會回傳多個元素，我們要設計回傳的方式。一般來說，有兩種方法；一種是做一個新串列，將符合條件的元素放入該串列後回傳；另一種是將函式做成迭代器；前者會簡單一些，這裡展示第一種做法。

以下是等效的 C 程式實作：

typedef bool (*filterFn) (int);

bool list_select_mut(List **self, filterFn filter)
{
    assert(*self);

    List *out = list_new();
    if (!out) {
        return false;
    }

    Node *curr = (*self)->head;
    while (curr) {
        if (filter(curr->data)) {
            if (!list_push(out, curr->data)) {
                list_free(out);

                return false;
            }
        }

        curr = curr->next;
    }

    list_free(*self);
    *self = out;

    return true;
}

`Sort`：根據特定條件對串列排序

本節採用插入排序法 (insertion)；但因內部為串列，故採新增串列的方式來實作，在空間開銷上較大。以下是 Sort 的虛擬碼：

sub Sort(L, filter):   
    K <- a new empty list

    if L->head == null
        return K
    end if

    curr <- L->head
    Push(K, curr->data)
    curr <- curr->next

    while curr != null do
        InsertBy(K, curr->data, filter)

        curr <- curr->next
    end while

    return K

這個程式的關鍵在 InsertBy(L, value, predicate) 函式，下文會說明。以下是等效的 C 語言程式碼：

typedef bool (*predicateFn) (int, int);

List * list_sort(const List *self, predicateFn filter)
{
    assert(self);

    if (list_is_empty(self)) {
        return NULL;
    }

    List *out = list_new();
    if (!out) {
        return out;
    }

    Node *curr = self->head;
    list_push(out, curr->data);
    curr = curr->next;

    while (curr) {
        list_insert_by(out, curr->data, filter);

        curr = curr->next;
    }

    return out;
}

InsertBy(L, value, predicate) 本身就是有序串列的插入元素的過程，但我們將節點關係拉到外部，變成參數。以下是其程式碼：

sub InsertBy(L, value, predicate): void
    node <- Node(value)

    if IsEmpty(L) then
        L.head <- node
        L.tail <- node
        L.size += 1
        return
    end if

    if L.head == L.tail then
        if predicate(value, L.head.data) then
            node.next <- L.head
            L.head.prev <- node
            L.head <- node
        else
            L.tail.next <- node
            node.prev <- L.tail
            L.tail <- node
        end if

        L.size += 1
        return
    end if

    p <- null
    q <- L.head
    while q.next != null do
        if predicate(value, q.data) then
            if p == null then
                node.next <- L.head
                L.head.prev <- node
                L.head <- node
            else
                p.next <- node
                node.prev <- p

                q.prev <- node
                node.next <- q
            end if

            break
        end if

        p <- q
        q <- q.next
    end while

    if q == L.tail then
        L.tail.next <- node
        node.prev <- L.tail
        L.tail <- node
    end if

    L.size += 1

當串列為空時，直接放入元素即可。

當串列僅存有單一元素時，會依情境放入串列的頭端或尾端，由於會影響頭 (或尾) 端指標所指向的元素，需分開處理。

走訪串列的方式為使用兩個指標來走訪，一個指標指向目標節點，另一個指標指向前一個節點。

當串列有多個元素時，需再細分如下：

放入串列頭端
放入串列尾端
放入串列其他位置

當放入串列的頭 (或尾) 端時，會影響串列頭 (或尾) 端指標指向的節點，故需獨立出來處理。以下是等效的 C 語言程式碼：

typedef bool (*predicateFn) (int, int);

bool list_insert_by(List *self, int value, predicateFn predicate)
{
    assert(self);

    Node *node = node_new(value);
    if (!node) {
        return false;
    }

    if (!(self->head)) {
        self->head = node;
        self->tail = node;

        self->size++;

        return true;
    }

    if (self->head == self->tail) {
        if (predicate(value, self->head->data)) {
            node->next = self->head;
            self->head->prev = node;
            self->head = node;
        }
        else {
            self->tail->next = node;
            node->prev = self->tail;
            self->tail = node;
        }

        self->size++;
        return true;
    }

    Node *p = NULL;
    Node *q = self->head;
    while (q->next) {
        if (predicate(value, q->data)) {
            if (!p) {
                node->next = self->head;
                self->head->prev = node;
                self->head = node;
            } else {
                p->next = node;
                node->prev = p;

                q->prev = node;
                node->next = q;
            }

            break;
        }

        p = q;
        q = q->next;
    }

    if (q == self->tail) {
        self->tail->next = node;
        node->prev = self->tail;
        self->tail = node;
    }

    self->size++;

    return true;
}

`Map`：走訪串列並根據特定條件修改元素

以下是 Map 的虛擬碼：

sub Map(L, mapper):
    assert L != null

    K <- a new empty list
    curr <- L->head
    while curr != null do
        Push(K, mapper(curr->data))

        curr <- curr->next
    end while

    return K

在本節中，我們另生成新串列，若要改成直接修改現有串列也可以。以下是等效的 C 語言程式碼：

typedef int (*mapFn) (int);

bool list_map(List **self, mapFn mapper)
{
    assert(*self);

    List *result = list_new();
    if (!result) {
        return false;
    }

    Node *p = (*self)->head;
    while (p) {
        if (!list_push(result, mapper(p->data))) {
            list_free(result);
            result = NULL;
            return false;
        }

        p = p->next;
    }

    list_free(*self);
    *self = result;

    return true;
}

`Reduce`：將串列所有元素根據特定條件合併

以下是 Reduce 的虛擬碼：

sub Reduce(L, reducer):
    assert not IsEmpty(L)

    curr <- L->head
    a <- curr->data
    curr <- curr->next // Iterate one step.

    while curr != null do
        a <- reducer(a, curr->data)

        curr <- curr->next
    end while

    return a

若串列只有一個元素時，while 迴圈不會運轉，函式仍是相容的。以下是等效的 C 語言程式碼：

typedef int (*reduceFn) (int, int);

int list_reduce(const List *self, reduceFn reducer)
{
    assert(!list_is_empty(self));

    Node *curr = self->head;
    int a = curr->data;
    curr = curr->next;

    while (curr) {
        a = reducer(a, curr->data);

        curr = curr->next;
    }

    return a;
}

Function Chaining：簡潔而緊湊的函數式程式

由於我們實作的語言是 C，無法寫得很簡潔，像下例使用 Ruby 做 function chaining 的語法就無法達成：

# Ruby excerpt.
# Chaining higher-order functions.

(1..10)
    .select { |n| n % 2 != 0 }
    .map { |n| n * n }
    .reduce { |a, b| a + b } \
        == (1 * 1 + 3 * 3 + 5 * 5 + 7 * 7 + 9 * 9) \
            or raise "Wrong result"

對等的 C 程式碼會 verbose 得多，因為兩者設計的理念不同，無法混為一談。

在演算法上的效率

以下是各個高階函式的效率：

Any(L, filter)：O(n)
All(L, filter)：O(n)
Find(L, filter)：O(n)
Select(L, filter)：O(n)
Sort(L, cmp)：O(n^2)
Map(L, mapper)：O(n)
Reduce(L, reducer)：O(n)

由於高階函式皆需走訪整個串列，效率至少為 O(n)。高階函式的功能在於其功能性而非程式效率。

關於作者

位元詩人 (ByteBard) 是資訊領域碩士，喜歡用開源技術來解決各式各樣的問題。這類技術跨平台、重用性高、技術生命長。

除了開源技術以外，位元詩人喜歡日本料理和黑咖啡，會一些日文，有時會自助旅行。

前言

迭代器 (Iterator)

串列的迭代器的抽象資料結構

實作迭代器

和串列走訪相關的高階函式 (Higher-Order Function)

串列相關高階函式的抽象資料結構

Any：確認串列中至少有一個元素符合條件

All：確認串列中所有元素皆符合條件

Find：從串列中根據特定條件找出元素

Select：從串列中取得所有符合條件的元素

Sort：根據特定條件對串列排序

Map：走訪串列並根據特定條件修改元素

Reduce：將串列所有元素根據特定條件合併