机器学习

Rasa聊天机器人(二):训练及构建

本文只要介绍了基于Rasa Core及Rasa NLU构建聊天机器人。
代码详见:https://github.com/xiaoxiong74/rasa_chatbot

Introduction
这个聊天机器人demo是用开源NLU框架rasa-nlu完成意图识别与实体识别,用rasa-core完成对话管理和与对话生成。

本demo完成的对话主要有:
1: 办理套餐、查询话费和流量(会话场景1)
2:案件查询(会话场景2)
3:Q&A问答+闲聊(合并在unknow_intent的场景里)
本demo实现流程

demo主要参考了
rasa_chatbot_cn
_rasa_chatbot
WeatherBot
主要包版本
python: 3.6.8
rasa-nlu: 0.14.4
rasa-core: 0.13.2
rasa-core-sdk: 0.12.1
tensorflow 1.12.0
1
2
3
4
5
主要文件描述
data/rasa_dataset_training.json :nlu训练数据
configs/_config.yml 类文件:模型流程定义(language、pipeline等)。nlu_model_config.yml中的pipeline可自定义,这里由于数据量较少,用了开源的方法和词向量(total_word_feature_extractor.dat)。如果你的rasa_dataset_training.json上数据足够多,可以尝试使用nlu_embedding_config.yml(本demo使用)配置来训练nlu model.
mobile_domain.yml :各组件、动作的定义集合,其实就是特征
endpoint.yml 服务地址、会话存储地址(url)
data/mobile_edit_story.md :定义各种对话场景,会话流训练数据
bot.py :各种训练nul与 dialogue的方法
actions.py :负责执行自定义 Action (通常都是具体的业务动作,在本项目中通信业务查询、案件查询、闲聊或Q&A)
data/total_word_feature_extractor.dat : 一个训练好的中文特征数据(使用nlu_moel_config.yml配置训练时会用到)
data/news_12g_baidubaike_20g_novel_90g_embedding_64.bin :训练好的word2vec模型(train_nlu_wordvector:wordvector_config.yml中用到),可下载更大的训练好的模型,下载地址:连接 密码:9aza
Command
train nlu model 训练NLU模型(可选择其他的,如train-nlu-wordvector)
python bot.py train-nlu
1
test nlu model 测试NLU模型,主要是看意图是否识别准确,是否抽取到实体
python -m rasa_nlu.server –path models/nlu 启动NUL模型服务

curl -XPOST 192.168.109.232:5000/parse -d ‘{“q”:”我要查昨天下午的抢劫案”, “project”: “default”, “model”: “current”}’
1
2
3
train dialogue 训练会话流程(可选择其他的,如train-nlu-transformer)
python bot.py train-dialogue-keras
1
test dialogue -client端测试对话流程(开启core client服务)
python -m rasa_core_sdk.endpoint –actions actions &

python -m rasa_core.run –nlu default/current –core models/dialogue_keras –endpoints endpoints.yml

1
2
3
4
dialogue 交互式训练生成新的story(相当于自己构造对话场景数据。新的story可以append到之前训练使用的story中重新训练,重复此过程)
python -m rasa_core.train interactive -o models/dialogue_keras -d mobile_domain.yml -s data/mobile_edit_story.md –endpoints endpoints.yml 重头开始训练story,零启动
python -m rasa_core.train interactive –core models/dialogue_keras –nlu default/current –endpoints endpoints.yml 通过已有story模型训练(构造更多的story,一般用这种方法)
1
2
provide dialogue service -Service端:提供对话服务接口(channel(如web)接入时开启此服务)
python -m rasa_core_sdk.endpoint –actions actions &

python -m rasa_core.run –nlu default/current –core models/dialogue_keras –credentials credentials.yml –endpoints endpoints.yml 开启core服务(Service)
1
2
3
compare policy
python -m rasa_core.train compare -c keras_policy.yml embed_policy.yml -d mobile_domain.yml -s data/mobile_edit_story.md -o comparison_models/ –runs 3 –percentages 0 25 50 70
1
evaluate policy
python -m rasa_core.evaluate compare -s data/mobile_edit_story.md –core comparison_models/ -o comparison_results/
1
Some tips
批量生产nlu训练数据
训练数据的构造是非常费时的一件事,本demo data/rasa_dataset_training.json 是通过一些规则自动生成的,节省很多人力。

工具地址here,
具体用法可参考chatito_gen_nlu_data中的使用文档。
标注语料可参考标注工具rasa-nlu-trainer
UI界面接入
UI界面接入可参考 https://github.com/howl-anderson/WeatherBot_UI 直接更改相应的端口或ip即可使用。

启动方法:
1、启动NLU服务
2、启动dialogue service
3、启动web服务
多看官方文档 rasa_nlu、rasa_core
其中也有些坑,使用期间有任何问题,欢迎随时issue!

Q&A
ner_duckling 无法使用
从rasa_nlu=0.14.0 开始就不使用ner_duckling,详见changelog,仅保留ner_duckling_http。因自己启动ner_duckling_http
报错,故自己把ner_duckling的模块又重新添加到了rasa_nlu中。添加方法如下:

1、找到rasa_nul包的位置,我的是/root/anaconda3/envs/rasa/lib/python3.6/site-packages/rasa_nlu
2、在rasa_nlu/extractors(前置路径省略) 中添加duckling_extractor.py文件 直接复制粘贴:https://github.com/RasaHQ/rasa_nlu/blob/0.13.x/rasa_nlu/extractors/duckling_extractor.py
3、在rasa_nlu/registry.py 中注册duckling_extractor组件
导入方法: from rasa_nlu.extractors.duckling_extractor import DucklingExtractor
添加组件: 在组件列表component_classes 中加入 DucklingExtractor
train_dialogue_transformer训练报维度不匹配错误
在policy/attention_keras 中要求输入的特征是偶数个,即mobile_domain.yml的特征数据量,若报错删除一个或增加一个特征即可

train_nlu_wordvector报编码错误
因为rasa_nlu_gao中的word2vec模型使用的txt文本模型,我这里用的bin二进制模型,所以如果使用bin的二进制模型需要更改
rasa_nlu_gao中的源码。修改方法:

1、定位到site-packages/rasa_nlu_gao/featurizers/intent_featurizer_wordvector.py
2、定位到两处模型加载的地方 model = gensim.models.KeyedVectors.load_word2vec_format 将里面的binary 改为True即可
Some magical functions
rasa-nlu-gao新增了N多个个自定义组件,具体用法和说明请参考该作者的 rasa对话系统踩坑记,个人觉得对新入坑聊天机器人的童鞋很有帮助,感谢作者的贡献。简单使用方法如下:

首先需要下载rasa-nlu-gao
pip install rasa-nlu-gao
1
训练模型
python bot.py train-nlu-gao
1
测试使用模型
python -m rasa_nlu_gao.server -c config_embedding_bilstm.yml –path models/nlu_gao/
1
效果截图

 

 

————————————————
版权声明:本文为CSDN博主「Mr_不想起床」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/qq_42189083/article/details/88076128

rasa语义模型报错:”error”: “bad value(s) in fds_to_keep”

在测试模型的时报,训练模型没问题,但是在测试的时候出现问题

不要慌,这种问题是因为sklearn与rasa的框架不兼容导致的

 

这时候需要首先卸载sklearn:

pip uninstall scikit-learn

然后重新安装并指定版本

pip install scikit-learn==0.19.2

ok!问题解决
————————————————
版权声明:本文为CSDN博主「聪明的小k」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/xuanzhuanguo/article/details/105033105

rasa_nlu_chi 测试不成功 “error“: “y should be a 1d array, got an array of shape (1, 5) instead.

rasa_nlu_chi原始git链接:
https://github.com/crownpku/rasa_nlu_chi

rasa_nlu_chi实现博客链接:

实现过程中有问题解决方法的博客:
https://ptorch.com/news/243.html

大牛写的rasa-UI链接:
https://github.com/paschmann/rasa-ui
UI安装教程:
https://blog.csdn.net/u011244708/article/details/82924823?spm=1001.2014.3001.5501
UI与nlu_chi整合问题:
https://ask.csdn.net/questions/3370686

问题贴:
https://github.com/crownpku/Rasa_NLU_Chi/issues

训练Rasa NLU的模型

python -m rasa_nlu.train -c sample_configs/config_jieba_mitie_sklearn.yml –data data/examples/rasa/demo-rasa_zh.json –path models
1
启动rasa_nlu的后台服务:

python -m rasa_nlu.server -c sample_configs/config_jieba_mitie_sklearn.yml –path models
1
打开一个新的terminal,我们现在就可以使用curl命令获取结果了, 尝试下面命令均不成功:

curl -XPOST localhost:5000/parse -d ‘{“q”:”我发烧了该吃什么药?”, “project”: “rasa_nlu_test”, “model”: “models\default\model_20210617-142700”}’ | python -mjson.tool
1
curl -XPOST localhost:5000/parse -d ‘{“q”:”我发烧了该吃什么药?”, “project”: “”, “model”: “models\default\model_20210617-142700”}’ | python -mjson.tool
1
curl -XPOST localhost:5000/parse -d ‘{“q”:”我发烧了该吃什么药?”, “model”: “models\default\model_20210617-142700”}’ | python -mjson.tool
1
curl -XPOST localhost:5000/parse -d ‘{“q”:”我发烧了该吃什么药?”}’ | python -mjson.tool
1
curl -XPOST localhost:5000/parse -d ‘{“q”:“我发烧了该吃什么药?”, “model”: “models\default\model_20210617-111551”}’ | python -mjson.tool

E:\1rasa\code\Rasa_NLU_Chi\Rasa_NLU_Chi-master\models\default\model_20210617-111551

测试命令始终报错

更换测试方式:
打开浏览器,地址中输入

http://localhost:5000/parse?q=你好
1
报错:

尝试scikit-learn降级:
https://github.com/RasaHQ/rasa/issues/1436

报错:

rasa 2.5.0 requires networkx<2.6,>=2.4, but you have networkx 2.1 which is incompatible.
rasa 2.5.0 requires packaging<21.0,>=20.0, but you have packaging 17.1 which is incompatible.
rasa 2.5.0 requires pykwalify<1.9,>=1.7, but you have pykwalify 1.6.0 which is incompatible.
rasa 2.5.0 requires scikit-learn<0.25,>=0.22, but
1
2
3
4
重新装这四个的最小匹配版本

rasa-nlu 0.14.4

rasa 2.5.0 requires cloudpickle<1.7,>=1.2, but you have cloudpickle 0.6.1 which is incompatible.
rasa 2.5.0 requires jsonschema<3.3,>=3.2, but you have jsonschema 2.6.0 which is incompatible.
rasa 2.5.0 requires matplotlib<3.4,>=3.1, but you have matplotlib 2.2.5 which is incompatible.
rasa 2.5.0 requires packaging<21.0,>=20.0, but you have packaging 18.0 which is incompatible.
rasa 2.5.0 requires ruamel.yaml<0.17.0,>=0.16.5, but you have ruamel-yaml 0.15.100 which is incompatible.
rasa 2.5.0 requires scikit-learn<0.25,>=0.22,

问题贴:按照教程发送请求后,返回”error”: “y should be a 1d array, got an array of shape (1, 5) instead.”

实测成功!!热泪!!

找到sklearn_intent_classifier.py
C:\Users\Administrator\Desktop\Rasa_NLU_Chi\rasa_nlu\classifiers\sklearn_intent_classifier.py
return self.le.inverse_transform(y) 修改为 return
self.le.inverse_transform(np.squeeze(y))

 

 

问题:curl测试仍然不成功
尝试方法1:

可以参考train.py ,def create_argument_parser():
parser.add_argument(’–project’, python -m rasa_nlu.train -c
sample_configs/config_jieba_mitie_sklearn.yml –data
data/examples/rasa/demo-rasa_zh.json –path rasa_nlu_test –project
rasa_nlu_test
————————————————
版权声明:本文为CSDN博主「Silber 甜」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/weixin_42639575/article/details/117988319

Rasa NLU for Chinese, a fork from RasaHQ/rasa_nlu

Please refer to newest instructions at official Rasa NLU document

中文Blog

 

Files you should have:

  • data/total_word_feature_extractor_zh.dat

Trained from Chinese corpus by MITIE wordrep tools (takes 2-3 days for training)

For training, please build the MITIE Wordrep Tool. Note that Chinese corpus should be tokenized first before feeding into the tool for training. Close-domain corpus that best matches user case works best.

A trained model from Chinese Wikipedia Dump and Baidu Baike can be downloaded from 中文Blog.

  • data/examples/rasa/demo-rasa_zh.json

Should add as much examples as possible.

Usage:

  1. Clone this project, and run
python setup.py install
  1. Modify configuration.

    Currently for Chinese we have two pipelines:

    Use MITIE+Jieba (sample_configs/config_jieba_mitie.yml):

language: "zh"

pipeline:
- name: "nlp_mitie"
  model: "data/total_word_feature_extractor_zh.dat"
- name: "tokenizer_jieba"
- name: "ner_mitie"
- name: "ner_synonyms"
- name: "intent_entity_featurizer_regex"
- name: "intent_classifier_mitie"

RECOMMENDED: Use MITIE+Jieba+sklearn (sample_configs/config_jieba_mitie_sklearn.yml):

language: "zh"

pipeline:
- name: "nlp_mitie"
  model: "data/total_word_feature_extractor_zh.dat"
- name: "tokenizer_jieba"
- name: "ner_mitie"
- name: "ner_synonyms"
- name: "intent_entity_featurizer_regex"
- name: "intent_featurizer_mitie"
- name: "intent_classifier_sklearn"
  1. (Optional) Use Jieba User Defined Dictionary or Switch Jieba Default Dictionoary:

    You can put in file path or directory path as the “user_dicts” value. (sample_configs/config_jieba_mitie_sklearn_plus_dict_path.yml)

language: "zh"

pipeline:
- name: "nlp_mitie"
  model: "data/total_word_feature_extractor_zh.dat"
- name: "tokenizer_jieba"
  default_dict: "./default_dict.big"
  user_dicts: "./jieba_userdict"
#  user_dicts: "./jieba_userdict/jieba_userdict.txt"
- name: "ner_mitie"
- name: "ner_synonyms"
- name: "intent_entity_featurizer_regex"
- name: "intent_featurizer_mitie"
- name: "intent_classifier_sklearn"
  1. Train model by running:

    If you specify your project name in configure file, this will save your model at /models/your_project_name.

    Otherwise, your model will be saved at /models/default

python -m rasa_nlu.train -c sample_configs/config_jieba_mitie_sklearn.yml --data data/examples/rasa/demo-rasa_zh.json --path models
  1. Run the rasa_nlu server:
python -m rasa_nlu.server -c sample_configs/config_jieba_mitie_sklearn.yml --path models
  1. Open a new terminal and now you can curl results from the server, for example:
$ curl -XPOST localhost:5000/parse -d '{"q":"我发烧了该吃什么药?", "project": "rasa_nlu_test", "model": "model_20170921-170911"}' | python -mjson.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   652    0   552  100   100    157     28  0:00:03  0:00:03 --:--:--   157
{
    "entities": [
        {
            "end": 3,
            "entity": "disease",
            "extractor": "ner_mitie",
            "start": 1,
            "value": "发烧"
        }
    ],
    "intent": {
        "confidence": 0.5397186422631861,
        "name": "medical"
    },
    "intent_ranking": [
        {
            "confidence": 0.5397186422631861,
            "name": "medical"
        },
        {
            "confidence": 0.16206323981749196,
            "name": "restaurant_search"
        },
        {
            "confidence": 0.1212448457737397,
            "name": "affirm"
        },
        {
            "confidence": 0.10333600028547868,
            "name": "goodbye"
        },
        {
            "confidence": 0.07363727186010374,
            "name": "greet"
        }
    ],
    "text": "我发烧了该吃什么药?"
}