线上电脑培训班,想做个卷帘门百度优化网站,苏州工业园区发布,余姚微信网站建设电商搜索要点剖析 前言#xff1a; 由于水平受限#xff0c;该文章并不能深度去剖析真实电商搜索功能的底层#xff0c;接下来的部分主要是我根据网页查询ai问答总结而得 希望大家多多包涵#xff0c;同时也欢迎纠错指正#xff01; 各大电商平台的搜索大致可分为三个阶段…电商搜索要点剖析 前言 由于水平受限该文章并不能深度去剖析真实电商搜索功能的底层接下来的部分主要是我根据网页查询ai问答总结而得 希望大家多多包涵同时也欢迎纠错指正 各大电商平台的搜索大致可分为三个阶段
1.Query理解对于用户输入的关键字进行纠错、改写、扩展、分词等
2.召回阶段根据查询词从商品库中召回有效正确的商品候选集
3.排序阶段给定召回商品候选集合根据众多因子对商品排序 京东分享-电商搜索中语义检索与商品排序https://zhuanlan.zhihu.com/p/465504164 论文:https://arxiv.org/pdf/2006.02282
图 1 淘宝搜索 相关论文《Embedding-based Product Retrieval in Taobao Search》
图 2 Query理解阶段
目前各大电商搜索基本都应用了NLP(自然语言处理)技术query阶段主要流程为拼写纠错–分词–类目预测
阶段主要目的使用技术拼写纠错修正用户拼音/别字/错字编辑距离、拼音映射、语言模型、BERT等分词将连续文本切成有意义词单元词典分词、CRF、BiLSTM-CRF、BERT等类目预测识别Query对应的商品类目TextCNN、FastText、BERT分类器、向量匹配
召回阶段
这个阶段主要基于Query阶段进行如果用户主动选择了商品类目则根据用户所选进行召回否则按照上一阶段类目预测进行类目筛选
多路召回
召回方式原理举例说明关键词匹配召回倒排索引基于分词匹配Query中含“耐克鞋”命中耐克商品类目召回通过类目ID匹配商品类目类目预测是“运动鞋”召回该类商品属性召回匹配颜色、尺码、功能等属性“夏季防滑”召回“凉鞋”类商品品牌召回品牌识别结果匹配商品品牌字段“苹果手机壳”召回 Apple 的配件向量召回Query 转成 Embedding 向量“网红同款裙子” → 语义检索相似商品历史行为召回利用用户画像/点击历史你经常搜“iPhone壳” → 优先相关商品同义词扩展召回Query改写扩展词匹配商品“牛仔裤”扩展为“牛仔长裤”、“jeans”
多路召回确保了商品尽可能命中有效避免了搜索商品为空的情况 举个完整例子 当用户输入苹果13手机可透明防摔 经过Query处理后得到
品牌苹果商品意图配件-手机壳属性透明、防摔类目预测手机配件手机壳
然后召回
通道类型召回商品关键词召回命中“苹果”、“手机壳”类目召回所有“手机壳”类目商品属性召回带有“防摔”、“透明”标签的商品品牌召回Apple 品牌下的配件向量召回含有“保护壳”或“壳套”的近义商品
排序阶段 排序阶段的总体结构
排序阶段一般会分成 三级排序粗到精
初排粗排 → 精排主排序 → 重排个性化/多目标 1️⃣ 初排粗排 / pre-ranking
目的快速过滤掉明显不相关或低质量商品减少计算负担模型轻量模型如LR、Tree-based、WideDeep特征Query词命中数、类目匹配、价格过滤、是否违规等 2️⃣ 精排主排序 / ranking 目的主力打分精细评估每个商品的价值 模型深度学习模型CTR/CVR预测如 DNN、DSSM、DIN、DeepFM、Transformer 特征举例
类别特征示例Query特征Query长度、词性、意图类型商品特征价格、销量、评价、品牌、库存用户特征历史兴趣、性别、年龄、最近浏览行为等用户-商品用户是否点击过该商品、用户是否收藏过等Query-商品分词重合度、类目相关性、品牌一致性等
目标最大化CTR点击率或 GMV成交额 3️⃣ 重排后排序 / re-ranking 目的提升多目标表现强化个性化、多样性 操作内容 加入冷启动保护、新品扶持、打散重复品牌引入上下文特征如用户最近浏览行为考虑业务规则广告插入、黑名单屏蔽、活动优先等 排序还要考虑的特殊场景
广告排序融合搜索广告商品会和自然排序混合新商品冷启动保护避免因点击少而排序靠后业务打分融合平台可以插入指定商品、活动商品 EsRedis模拟电商搜索 由于技术有限这里就直接采用ik分词器进行分词采用es进行模糊匹配大家要是有更好改进的点可以在评论区分享呀我也是最近刚学习了Es突发奇想做的小demo效果在文章末有展示 Github地址https://github.com/Aeroeia/ElasticSearch 大家要是这样觉得有用的话可以点个小星星呀 #mermaid-svg-uKfZ2KhWLqfq6fVT {font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}#mermaid-svg-uKfZ2KhWLqfq6fVT .error-icon{fill:#552222;}#mermaid-svg-uKfZ2KhWLqfq6fVT .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-uKfZ2KhWLqfq6fVT .edge-thickness-normal{stroke-width:2px;}#mermaid-svg-uKfZ2KhWLqfq6fVT .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-uKfZ2KhWLqfq6fVT .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-uKfZ2KhWLqfq6fVT .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-uKfZ2KhWLqfq6fVT .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-uKfZ2KhWLqfq6fVT .marker{fill:#333333;stroke:#333333;}#mermaid-svg-uKfZ2KhWLqfq6fVT .marker.cross{stroke:#333333;}#mermaid-svg-uKfZ2KhWLqfq6fVT svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-uKfZ2KhWLqfq6fVT .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-uKfZ2KhWLqfq6fVT .cluster-label text{fill:#333;}#mermaid-svg-uKfZ2KhWLqfq6fVT .cluster-label span{color:#333;}#mermaid-svg-uKfZ2KhWLqfq6fVT .label text,#mermaid-svg-uKfZ2KhWLqfq6fVT span{fill:#333;color:#333;}#mermaid-svg-uKfZ2KhWLqfq6fVT .node rect,#mermaid-svg-uKfZ2KhWLqfq6fVT .node circle,#mermaid-svg-uKfZ2KhWLqfq6fVT .node ellipse,#mermaid-svg-uKfZ2KhWLqfq6fVT .node polygon,#mermaid-svg-uKfZ2KhWLqfq6fVT .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-uKfZ2KhWLqfq6fVT .node .label{text-align:center;}#mermaid-svg-uKfZ2KhWLqfq6fVT .node.clickable{cursor:pointer;}#mermaid-svg-uKfZ2KhWLqfq6fVT .arrowheadPath{fill:#333333;}#mermaid-svg-uKfZ2KhWLqfq6fVT .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-uKfZ2KhWLqfq6fVT .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-uKfZ2KhWLqfq6fVT .edgeLabel{background-color:#e8e8e8;text-align:center;}#mermaid-svg-uKfZ2KhWLqfq6fVT .edgeLabel rect{opacity:0.5;background-color:#e8e8e8;fill:#e8e8e8;}#mermaid-svg-uKfZ2KhWLqfq6fVT .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-uKfZ2KhWLqfq6fVT .cluster text{fill:#333;}#mermaid-svg-uKfZ2KhWLqfq6fVT .cluster span{color:#333;}#mermaid-svg-uKfZ2KhWLqfq6fVT div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-uKfZ2KhWLqfq6fVT :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 缓存搜索结果 同步数据 用户界面 Web服务器 Elasticsearch Redis缓存 MySQL数据库 环境准备
Jdk11MavenRedisMysqlElasticsearch 7.x、ik分词器
es索引库结构
PUT /item
{settings: {analysis: {analyzer: {suggest_analyzer: {tokenizer: keyword,filter: [lowercase, trim]}}}},mappings: {properties: {id: { type: keyword },name: {type: text,analyzer: ik_max_word },brand: { type: keyword},category: {type: keyword},suggest_keywords: {type: completion,analyzer: suggest_analyzer,preserve_separators: false,preserve_position_increments: false},price: { type: integer , index: false},image: { type: keyword , index: false}}}
}Mysql中也基本是这些字段只不过没有suggest_keywords
后端实现(SpringBootRestHighLevelClientSpringCache(Redis))
Maven依赖 dependencies!-- Elasticsearch 客户端 --dependencygroupIdorg.elasticsearch.client/groupIdartifactIdelasticsearch-rest-high-level-client/artifactIdversion7.12.1/version/dependency!-- 数据库连接池 --dependencygroupIdcom.alibaba/groupIdartifactIddruid-spring-boot-starter/artifactIdversion1.2.1/version/dependency!-- Redis依赖 --dependencygroupIdorg.springframework.boot/groupIdartifactIdspring-boot-starter-data-redis/artifactId/dependency!-- MySQL驱动 --dependencygroupIdcom.mysql/groupIdartifactIdmysql-connector-j/artifactIdscoperuntime/scope/dependency!-- Spring Web --dependencygroupIdorg.springframework.boot/groupIdartifactIdspring-boot-starter-web/artifactId/dependency!-- Lombok --dependencygroupIdorg.projectlombok/groupIdartifactIdlombok/artifactIdversion1.18.30/version/dependency!-- MyBatis Plus --dependencygroupIdcom.baomidou/groupIdartifactIdmybatis-plus-boot-starter/artifactIdversion3.5.6/version/dependency!-- Spring Boot 基础依赖 --dependencygroupIdorg.springframework.boot/groupIdartifactIdspring-boot-starter/artifactId/dependency!-- Spring Boot 测试依赖 --dependencygroupIdorg.springframework.boot/groupIdartifactIdspring-boot-starter-test/artifactIdscopetest/scope/dependency
!-- hutool工具包--dependencygroupIdcn.hutool/groupIdartifactIdhutool-all/artifactIdversion5.8.11/version/dependency/dependencies配置类
Configuration
public class EsConfiguration {Beanpublic RestHighLevelClient restHighLevelClient(){return new RestHighLevelClient(RestClient.builder(HttpHost.create(192.168.112.128:9200) //写成自己es的ip端口));}
}Controller
RestController
Slf4j
RequestMapping(es)
RequiredArgsConstructor
public class EsController {private final IItemService itemService;//用于获取搜索建议GetMapping(/suggestions)public ListString getSuggestions(RequestParam String keyword){log.info(keyword:{},keyword);return itemService.getSuggestions(keyword);}//查询商品接口GetMapping(/search)public ListItemVO search(RequestParam(required false) String keyword, RequestParam(required false) String brand, RequestParam(required false) String category){log.info(keyword:{},keyword);log.info(brand:{},brand);log.info(category:{},category);return itemService.search(keyword,brand,category);}
}
ServiceImpl
Service
Slf4j
RequiredArgsConstructor
public class ItemServiceImpl extends ServiceImplItemMapper, Item implements IItemService {private final RestHighLevelClient client;//自定义字段private final String suggestionName sug;OverrideCacheable(value suggestions,key #keyword)public ListString getSuggestions(String keyword) {try {// 构建搜索建议请求SuggestBuilder sb new SuggestBuilder();sb.addSuggestion(suggestionName,SuggestBuilders.completionSuggestion(suggest_keywords).prefix(keyword).size(10) //最多十条建议.skipDuplicates(true));SearchSourceBuilder src new SearchSourceBuilder().suggest(sb).size(0);SearchRequest req new SearchRequest(item).source(src);// 执行搜索SearchResponse resp client.search(req, RequestOptions.DEFAULT);log.info(搜索响应: {}, resp);// 解析建议结果CompletionSuggestion suggestion resp.getSuggest().getSuggestion(suggestionName);if (suggestion null) {log.warn(未找到建议结果);return new ArrayList();}ListString suggestions suggestion.getOptions().stream().map(option - {String str option.getText().toString();//关键词高亮StringBuilder builder new StringBuilder(str);builder.insert(0,em);builder.insert(keyword.length()4,/em);return builder.toString();}).collect(Collectors.toList());return suggestions;} catch (IOException e) {log.error(搜索建议时发生错误, e);throw new RuntimeException(搜索建议失败, e);}}//同步mysql数据到esOverridepublic void syncData() {log.info(开始同步数据到ES);ListItem list this.list();BulkRequest request new BulkRequest();for (Item item : list) {try {ListString suggestions SuggestionsUtil.getSuggestions(item.getName(), item.getBrand(), item.getCategory());EsItem esItem BeanUtil.copyProperties(item, EsItem.class);// 构建completion suggester所需的特定格式MapString, Object suggest new HashMap();suggest.put(input, suggestions);suggest.put(weight, 10);esItem.setSuggest_keywords(suggest);String jsonString JSONUtil.toJsonStr(esItem);request.add(new IndexRequest(item).id(item.getId().toString()).source(jsonString, XContentType.JSON));} catch (Exception e) {log.error(处理商品数据时发生错误, ID: {}, item.getId(), e);}}try {client.bulk(request, RequestOptions.DEFAULT);log.info(数据同步完成共同步 {} 条记录, list.size());} catch (IOException e) {log.error(批量同步数据时发生错误, e);throw new RuntimeException(同步数据失败, e);}}OverrideCacheable(value search,key #keyword-#brand-#category)public ListItemVO search(String keyword, String brand, String category) {SearchRequest request new SearchRequest(item);//将keyword匹配name、brand、categoryBoolQueryBuilder boolQueryBuilder;if(StrUtil.isNotBlank(keyword)){boolQueryBuilder QueryBuilders.boolQuery().must(QueryBuilders.multiMatchQuery(keyword, name, brand, category));}else{boolQueryBuilder QueryBuilders.boolQuery().must(QueryBuilders.matchAllQuery());}if(StrUtil.isNotBlank(brand)){boolQueryBuilder.filter(QueryBuilders.termQuery(brand, brand));}if(StrUtil.isNotBlank(category)){boolQueryBuilder.filter(QueryBuilders.termQuery(category, category));}request.source().query(boolQueryBuilder).highlighter(SearchSourceBuilder.highlight().field(name));request.source().size(10);SearchResponse response;try {response client.search(request, RequestOptions.DEFAULT);} catch (IOException e) {throw new RuntimeException(e);}SearchHit[] hits response.getHits().getHits();ListItemVO list new ArrayList();for(var hit : hits){MapString, HighlightField highlightFields hit.getHighlightFields();HighlightField highlightField highlightFields.get(name);String highlight null;String source hit.getSourceAsString();ItemVO itemVO JSONUtil.toBean(source, ItemVO.class);if(highlightField!null){highlight highlightField.getFragments()[0].toString();}else{highlight itemVO.getName();}itemVO.setHighlight(highlight);list.add(itemVO);}return list;}}生成suggest_keyword工具类 public class SuggestionsUtil {public static ListString getSuggestions(String name, String brand, String category) {//简单按照名字中空格作为划分区ListString suggestions Arrays.stream(name.split(\\s)).filter(word - word.length() 2).distinct().collect(Collectors.toList());// 将品牌和分类直接加入搜索建议如果不为空if (StrUtil.isNotBlank(brand)) {suggestions.add(brand.trim());}if (StrUtil.isNotBlank(category)) {suggestions.add(category.trim());}return suggestions;}
}
SpringTask 定时将mysql数据同步到es
Component
RequiredArgsConstructor
public class EsTask {private final IItemService itemService;private final RestHighLevelClient client;Scheduled(cron 0 */5 * * * ?) // 每5分钟执行一次public void syncProductToEs() {itemService.syncData();}
}效果展示