Python高级数据结构——B树和B+树
Python中的B树和B+树:高级数据结构解析
B树和B+树是一种多叉树,常用于处理大量数据的存储和检索操作。它们广泛应用于文件系统、数据库索引等领域,具有高效的插入、删除和搜索性能。在本文中,我们将深入讲解Python中的B树和B+树,包括它们的基本概念、插入、删除和搜索操作,并使用代码示例演示它们的使用。
基本概念
1. B树和B+树的定义
B树和B+树是一种自平衡的搜索树,其每个节点可以包含多个键值对。B树和B+树的主要区别在于节点的定义和遍历方式。
B树: 每个节点包含键值对,并具有子节点。B树的节点包含的键值对数量介于t-1和2t-1之间,其中t是树的最小度数。
B+树: 内部节点只包含键值,不存储数据。所有的数据都存储在叶子节点上,形成有序链表。B+树的节点包含的键值对数量介于t和2t-1之间。
class BNode:
def __init__(self, is_leaf=True):
self.keys = []
self.children = []
self.is_leaf = is_leaf
class BTree:
def __init__(self, t):
self.root = BNode()
self.t = t # 最小度数
插入操作
2. B树和B+树的插入
B树和B+树的插入操作包括两个步骤:首先找到要插入的位置,然后将键值对插入到节点中。插入后,可能需要进行节点分裂操作,以保持树的平衡性。
class BTree:
# ... (前面的定义)
def insert(self, key):
root = self.root
if len(root.keys) == 2 * self.t - 1:
new_root = BNode(is_leaf=False)
new_root.children.append(root)
self._split_child(new_root, 0)
self.root = new_root
self._insert_non_full(new_root, key)
else:
self._insert_non_full(root, key)
def _insert_non_full(self, x, key):
i = len(x.keys) - 1
if x.is_leaf:
x.keys.append(None)
while i >= 0 and key < x.keys[i]:
x.keys[i + 1] = x.keys[i]
i -= 1
x.keys[i + 1] = key
else:
while i >= 0 and key < x.keys[i]:
i -= 1
i += 1
if len(x.children[i].keys) == 2 * self.t - 1:
self._split_child(x, i)
if key > x.keys[i]:
i += 1
self._insert_non_full(x.children[i], key)
def _split_child(self, x, i):
t = self.t
y = x.children[i]
z = BNode(is_leaf=y.is_leaf)
x.children.insert(i + 1, z)
x.keys.insert(i, y.keys[t - 1])
z.keys = y.keys[t:2 * t - 1]
y.keys = y.keys[0:t - 1]
if not y.is_leaf:
z.children = y.children[t:2 * t]
y.children = y.children[0:t]
删除操作
3. B树和B+树的删除
B树和B+树的删除操作同样包括两个步骤:首先找到要删除的位置,然后从节点中删除键值对。删除后,可能需要进行节点合并操作,以保持树的平衡性。
class BTree:
# ... (前面的定义)
def delete(self, key):
root = self.root
if len(root.keys) == 0:
return
self._delete(root, key)
if len(root.keys) == 0 and not root.is_leaf:
self.root = root.children[0]
def _delete(self, x, key):
t = self.t
i = 0
while i < len(x.keys) and key > x.keys[i]:
i += 1
if i < len(x.keys) and key == x.keys[i]:
if x.is_leaf:
del x.keys[i]
else:
self._delete_internal(x, i)
elif not x.is_leaf:
self._delete_recursive(x, i, key)
def _delete_recursive(self, x, i, key):
t = self.t
child = x.children[i]
if len(child.keys) == t - 1:
self._fix_child(x, i)
i -= 1
self._delete(child, key)
def _delete_internal(self, x, i):
t = self.t
key = x.keys[i]
if x.children[i].is_leaf:
predecessor = self._get_predecessor(x.children[i])
x.keys[i] = predecessor
self._delete(x.children[i], predecessor)
else:
successor = self._get_successor(x.children[i])
x.keys[i] = successor
self._delete(x.children[i], successor)
def _get_predecessor(self, x):
while not x.is_leaf:
x = x.children[-1]
return x.keys[-1]
def _get_successor(self, x):
while not x.is_leaf:
x = x.children[0]
return x.keys[0]
def _fix_child(self, x, i):
t = self.t
if i > 0 and len(x.children[i - 1].keys) >= t:
self._borrow_from_prev(x, i)
elif i < len(x.children) - 1 and len(x.children[i + 1].keys) >= t:
self._borrow_from_next(x, i)
elif i > 0:
self._merge(x, i - 1)
else:
self._merge(x, i)
def _borrow_from_prev(self, x, i):
child = x.children[i]
sibling = x.children[i - 1]
child.keys.insert(0, x.keys[i - 1])
x.keys[i - 1] = sibling.keys.pop()
if not child.is_leaf:
child.children.insert(0, sibling.children.pop())
def _borrow_from_next(self, x, i):
child = x.children[i]
sibling = x.children[i + 1]
child.keys.append(x.keys[i])
x.keys[i] = sibling.keys.pop(0)
if not child.is_leaf:
child.children.append(sibling.children.pop(0))
def _merge(self, x, i):
t = self.t
child = x.children[i]
sibling = x.children[i + 1]
child.keys.append(x.keys.pop(i))
child.keys += sibling.keys
if not child.is_leaf:
child.children += sibling.children
del x.children[i + 1]
搜索操作
4. B树和B+树的搜索
B树和B+树的搜索操作与普通的二叉搜索树类似,通过递归实现。
class BTree:
# ... (前面的定义)
def search(self, key):
return self._search(self.root, key)
def _search(self, x, key):
i = 0
while i < len(x.keys) and key > x.keys[i]:
i += 1
if i < len(x.keys) and key == x.keys[i]:
return True
elif x.is_leaf:
return False
else:
return self._search(x.children[i], key)
应用场景
B树和B+树广泛应用于文件系统、数据库索引等需要大量数据存储和检索的场景。它们的平衡性和高效性能使得它们成为处理大规模数据的理想选择。
总结
B树和B+树是一种多叉搜索树,具有高效的插入、删除和搜索性能。它们通过节点的合并和分裂操作来保持平衡,适用于大规模数据的存储和检索。在Python中,我们可以使用类似上述示例的代码实现B树和B+树,并根据实际问题定制插入、删除和搜索的操作。理解B树和B+树的基本概念和操作,将有助于更好地应用它们解决实际问题,提高数据存储和检索的效率。