Milvus 向量索引详解

全面解析 Milvus 支持的索引类型、原理、参数配置与选型建议

一、为什么需要向量索引

向量搜索的本质

向量数据库的核心操作是近似最近邻搜索（ANN, Approximate Nearest Neighbor）：

给定查询向量 q，从数据库中找到最相似的 k 个向量

暴力搜索 vs 索引搜索

方式	时间复杂度	1亿数据搜索时间	精确率
暴力搜索（FLAT）	O(N)	~10秒	100%
索引搜索（HNSW）	O(logN)	~10毫秒	>95%

索引通过牺牲少量精确率，换取数量级的速度提升

索引的核心思想

┌─────────────────────────────────────────────────────────┐
│  原始向量空间（100万向量）                                │
│                                                         │
│  暴力搜索：计算与100万个向量的距离                        │
│                                                         │
│  索引搜索：                                              │
│  1. 预处理：构建数据结构（索引）                          │
│  2. 搜索时：只访问少量候选向量（如1000个）                │
│  3. 返回：在这1000个中找最近的                           │
└─────────────────────────────────────────────────────────┘

二、索引类型总览

Milvus 支持的索引类型

索引类型	全称	适用向量类型	特点
FLAT	暴力搜索	稠密向量	精确率100%，速度慢
IVF_FLAT	Inverted File + FLAT	稠密向量	平衡速度与精度
IVF_SQ8	IVF + Scalar Quantization	稠密向量	内存省75%
IVF_PQ	IVF + Product Quantization	稠密向量	内存省90%+
HNSW	Hierarchical Navigable Small World	稠密向量	⭐ 速度最快，最常用
SCANN	Google ScaNN	稠密向量	谷歌优化，超大规模
SPARSE_INVERTED_INDEX	稀疏倒排索引	稀疏向量	专为稀疏向量设计

选型速查表

数据规模	推荐索引	理由
< 10万	FLAT	精确，无需索引
10万 - 1000万	HNSW	速度极快，召回率高
1000万 - 1亿	HNSW / IVF_SQ8	平衡速度与内存
> 1亿	IVF_PQ / SCANN	极致压缩
稀疏向量	SPARSE_INVERTED_INDEX	专用优化

三、FLAT - 暴力搜索

原理

不构建任何索引结构，直接计算查询向量与所有向量的距离。

搜索过程：
for each vector in database:
    distance = calculate_distance(query, vector)
    keep top-k smallest distances
return top-k vectors

适用场景

✅ 数据量很小（< 10,000）
✅ 需要100%精确率（如金融风控）
✅ 测试/调试阶段
❌ 生产环境大数据量

代码示例

from pymilvus import Collection, FieldSchema, DataType

# 创建集合
fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=768)
]

# 创建 FLAT 索引（实际上不建索引，就是暴力搜索）
index_params = {
    "index_type": "FLAT",
    "metric_type": "L2",  # 欧氏距离
    "params": {}
}

collection.create_index(field_name="embedding", index_params=index_params)

四、IVF 系列索引

4.1 IVF_FLAT - 倒排文件索引

原理

基于空间划分思想：

┌─────────────────────────────────────────────────────────┐
│  步骤1：训练（K-Means聚类）                               │
│                                                         │
│  将向量空间划分为 nlist 个聚类（Voronoi单元）             │
│                                                         │
│      ○        ○        ○                                │
│     /|\      /|\      /|\      ← 聚类中心               │
│    ─┼──    ─┼──    ─┼──                               │
│     \|/      \|/      \|/                               │
│      ○        ○        ○                                │
│                                                         │
│  步骤2：搜索                                             │
│  1. 找到查询向量最近的 nprobe 个聚类                      │
│  2. 只在这 nprobe 个聚类中搜索                            │
│  3. 返回Top-K                                           │
└─────────────────────────────────────────────────────────┘

参数说明

参数	含义	建议值
`nlist`	聚类中心数量	4 × √N，如100万数据用 4000
`nprobe`	搜索时访问的聚类数	10-100，越大越准越慢

代码示例

# 创建索引
index_params = {
    "index_type": "IVF_FLAT",
    "metric_type": "L2",
    "params": {"nlist": 128}  # 聚类中心数
}
collection.create_index(field_name="embedding", index_params=index_params)

# 搜索参数
search_params = {
    "metric_type": "L2",
    "params": {"nprobe": 10}  # 搜索10个聚类
}
results = collection.search(
    data=[query_vector],
    anns_field="embedding",
    param=search_params,
    limit=10
)

4.2 IVF_SQ8 - 标量量化

在 IVF_FLAT 基础上，对每个向量进行标量量化压缩：

原始向量：float32 × dim = 4 × dim 字节
SQ8压缩：int8 × dim = 1 × dim 字节

压缩率：75%（内存占用降为1/4）

适用：内存受限，可接受少量精度损失

index_params = {
    "index_type": "IVF_SQ8",
    "metric_type": "L2",
    "params": {"nlist": 128}
}

4.3 IVF_PQ - 乘积量化

更激进的压缩方案：

原理：
1. 将高维向量切分为 m 个子向量
2. 每个子向量用 k-means 量化到 nbits 位
3. 用码本索引代替原始值

压缩率：可达 1/16 ~ 1/32

参数	含义	建议值
`m`	子向量数量	dim 的约数，如 dim=128, m=16
`nbits`	每个子向量量化位数	8

index_params = {
    "index_type": "IVF_PQ",
    "metric_type": "L2",
    "params": {
        "nlist": 128,
        "m": 16,      # 子向量数
        "nbits": 8    # 量化位数
    }
}

五、HNSW - 图索引 ⭐ 最常用

5.1 原理

HNSW（Hierarchical Navigable Small World）构建多层导航图：

┌─────────────────────────────────────────────────────────┐
│  HNSW 多层图结构                                         │
│                                                         │
│  Layer 2 (稀疏层):                                       │
│      ○───────────────○                                  │
│      │               │         长距离连接                │
│  Layer 1 (中层):     │                                  │
│      ○──────○───────○──○                                │
│     / \      |       |  \                               │
│  Layer 0 (密集层):   │   \                              │
│  ○─○─○─○─○─○─○─○─○─○─○─○─○                             │
│  ↑ 包含所有数据点                                        │
│                                                         │
│  搜索过程：                                              │
│  1. 从顶层随机点开始                                     │
│  2. 贪心找最近邻，逐层下降                                │
│  3. 在底层找到最近邻                                      │
└─────────────────────────────────────────────────────────┘

5.2 参数详解

参数	含义	建议值	影响
`M`	每个节点的最大邻居数	8-64	越大：图越密，搜索越快，内存越大
`efConstruction`	构建时的搜索范围	64-400	越大：图质量越好，构建越慢
`ef`	搜索时的候选集大小	≥ top_k	越大：召回率越高，搜索越慢

5.3 参数选择建议

数据规模      M      efConstruction    ef(搜索时)
─────────────────────────────────────────────
< 100万      16         200            64
100万-1亿    32         400            128
> 1亿        64         400            256

5.4 代码示例

# 创建 HNSW 索引
index_params = {
    "index_type": "HNSW",
    "metric_type": "COSINE",  # 或 L2, IP
    "params": {
        "M": 16,              # 每个节点最大连接数
        "efConstruction": 200 # 构建搜索范围
    }
}
collection.create_index(field_name="embedding", index_params=index_params)

# 加载集合
collection.load()

# 搜索（指定 ef 参数）
search_params = {
    "metric_type": "COSINE",
    "params": {"ef": 64}     # 搜索候选集大小
}
results = collection.search(
    data=[query_vector],
    anns_field="embedding",
    param=search_params,
    limit=10
)

5.5 HNSW vs IVF 对比

维度	HNSW	IVF_FLAT
搜索速度	⭐⭐⭐ 极快	⭐⭐ 快
召回率	⭐⭐⭐ >95%	⭐⭐ ~90%
内存占用	⭐ 大（2-3倍）	⭐⭐ 中等
构建时间	⭐ 慢	⭐⭐ 中等
增量更新	⭐ 支持差	⭐⭐⭐ 支持好
适用规模	百万-千万级	百万-亿级

六、稀疏向量索引

6.1 什么是稀疏向量

稠密向量：维度固定，每个维度都有值
  [0.1, 0.3, 0.0, 0.8, 0.2, ...]  # 768维，大部分非零

稀疏向量：维度极高，但大部分为0
  {1234: 0.8, 5678: 0.3, 9999: 0.5}  # 3万个token，只有3个非零

生成方式：

SPLADE：基于BERT的稀疏编码器
BGE-M3：同时输出稠密+稀疏向量

6.2 SPARSE_INVERTED_INDEX

专为稀疏向量设计的倒排索引：

# 定义稀疏向量字段
fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
    FieldSchema(name="sparse_vec", dtype=DataType.SPARSE_FLOAT_VECTOR)
]

# 创建稀疏向量索引
index_params = {
    "index_type": "SPARSE_INVERTED_INDEX",
    "metric_type": "IP",  # 内积
    "params": {
        "drop_ratio_build": 0.2  # 构建时丢弃的小值比例
    }
}
collection.create_index(field_name="sparse_vec", index_params=index_params)

6.3 混合检索（稠密+稀疏）

# 同时定义两种向量
fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
    FieldSchema(name="dense_vec", dtype=DataType.FLOAT_VECTOR, dim=768),
    FieldSchema(name="sparse_vec", dtype=DataType.SPARSE_FLOAT_VECTOR)
]

# 分别创建索引
collection.create_index("dense_vec", {
    "index_type": "HNSW",
    "metric_type": "COSINE",
    "params": {"M": 16, "efConstruction": 200}
})

collection.create_index("sparse_vec", {
    "index_type": "SPARSE_INVERTED_INDEX",
    "metric_type": "IP",
    "params": {"drop_ratio_build": 0.2}
})

# 混合搜索
from pymilvus import AnnSearchRequest

dense_req = AnnSearchRequest(
    data=[dense_query],
    anns_field="dense_vec",
    param={"metric_type": "COSINE", "params": {"ef": 64}},
    limit=100
)

sparse_req = AnnSearchRequest(
    data=[sparse_query],
    anns_field="sparse_vec",
    param={"metric_type": "IP", "params": {"drop_ratio_build": 0.2}},
    limit=100
)

# RRF 融合
results = collection.hybrid_search(
    reqs=[dense_req, sparse_req],
    rerank=RRFRanker(),
    limit=10
)

七、索引选型指南

7.1 决策流程图

数据量 < 10万?
  ├─ 是 → 用 FLAT（精确）
  └─ 否 → 需要100%精确率?
           ├─ 是 → 用 FLAT
           └─ 否 → 内存充足?
                    ├─ 是 → 用 HNSW（速度最快）
                    └─ 否 → 数据量 > 1亿?
                             ├─ 是 → 用 IVF_PQ（压缩）
                             └─ 否 → 用 IVF_FLAT（平衡）

7.2 场景匹配

场景	推荐索引	理由
推荐系统（实时）	HNSW	延迟低，用户体验好
图像检索	HNSW	特征向量维度高，HNSW效果好
文本语义搜索	HNSW + 稀疏向量	混合检索效果更好
日志分析	IVF_FLAT	数据量大，可接受稍高延迟
嵌入式设备	IVF_SQ8/PQ	内存受限
金融风控	FLAT	需要100%精确率

八、参数调优建议

8.1 通用调优原则

召回率不够 → 增加搜索参数（nprobe/ef）
速度太慢 → 减少搜索参数，或换更快的索引
内存不够 → 使用量化索引（SQ8/PQ）
构建太慢 → 减少 M 或 efConstruction

8.2 HNSW 调优

# 追求速度（召回率稍低）
index_params = {"M": 8, "efConstruction": 64}
search_params = {"ef": 32}

# 追求召回率（速度稍慢）
index_params = {"M": 32, "efConstruction": 400}
search_params = {"ef": 256}

# 平衡方案（推荐）
index_params = {"M": 16, "efConstruction": 200}
search_params = {"ef": 64}

8.3 IVF 调优

# nlist 计算：4 * sqrt(N)
import math
nlist = 4 * int(math.sqrt(num_vectors))

# nprobe 调优
# 召回率测试：逐步增加 nprobe，直到召回率满足要求

九、代码示例

9.1 完整示例：HNSW 索引

from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection

# 连接 Milvus
connections.connect(host="localhost", port="19530")

# 定义字段
fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
    FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=512),
    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=768)
]

# 创建集合
schema = CollectionSchema(fields, description="文档向量库")
collection = Collection(name="documents", schema=schema)

# 插入数据
import numpy as np
num_entities = 10000
data = [
    [f"doc_{i}" for i in range(num_entities)],
    np.random.random((num_entities, 768)).tolist()
]
collection.insert(data)

# 创建 HNSW 索引
index_params = {
    "index_type": "HNSW",
    "metric_type": "COSINE",
    "params": {"M": 16, "efConstruction": 200}
}
collection.create_index(field_name="embedding", index_params=index_params)

# 加载到内存
collection.load()

# 搜索
query_vector = np.random.random(768).tolist()
search_params = {"metric_type": "COSINE", "params": {"ef": 64}}

results = collection.search(
    data=[query_vector],
    anns_field="embedding",
    param=search_params,
    limit=5,
    output_fields=["text"]
)

for hits in results:
    for hit in hits:
        print(f"ID: {hit.id}, Distance: {hit.distance}, Text: {hit.entity.text}")

# 释放资源
collection.release()
connections.disconnect()

9.2 索引管理操作

# 查看索引信息
collection.indexes

# 删除索引
collection.drop_index()

# 查看索引进度
utility.index_building_progress("documents")

# 查看索引状态
utility.load_state("documents")

附录：关键术语表

术语	英文	含义
ANN	Approximate Nearest Neighbor	近似最近邻
IVF	Inverted File	倒排文件
PQ	Product Quantization	乘积量化
SQ	Scalar Quantization	标量量化
HNSW	Hierarchical Navigable Small World	分层可导航小世界
nlist	-	IVF聚类中心数
nprobe	-	IVF搜索时访问的聚类数
M	-	HNSW节点最大连接数
ef	-	HNSW搜索范围
Recall	-	召回率
Latency	-	延迟
Throughput	-	吞吐量

文档生成时间：2026-04-10
适用 Milvus 版本：2.3+

Milvus 向量索引详解

目录

一、为什么需要向量索引

向量搜索的本质

暴力搜索 vs 索引搜索

索引的核心思想

二、索引类型总览

Milvus 支持的索引类型

选型速查表

三、FLAT - 暴力搜索

原理

适用场景

代码示例

四、IVF 系列索引

4.1 IVF_FLAT - 倒排文件索引

原理

参数说明

代码示例

4.2 IVF_SQ8 - 标量量化

4.3 IVF_PQ - 乘积量化

五、HNSW - 图索引 ⭐ 最常用

5.1 原理

5.2 参数详解

5.3 参数选择建议

5.4 代码示例

5.5 HNSW vs IVF 对比

六、稀疏向量索引

6.1 什么是稀疏向量

6.2 SPARSE_INVERTED_INDEX

6.3 混合检索（稠密+稀疏）

七、索引选型指南

7.1 决策流程图

7.2 场景匹配

八、参数调优建议

8.1 通用调优原则

8.2 HNSW 调优

8.3 IVF 调优

九、代码示例

9.1 完整示例：HNSW 索引

9.2 索引管理操作

附录：关键术语表

FEATURED TAGS

FRIENDS