Deploy the Rasa framework to the Kubernetes cluster and allocate GPU resources

Rasa is a conversational AI framework for creating context based intelligent voice assistants. In this section, let's see how to deploy it to k8s cluster and try its own model.

I am a T-type person Xiaofu, an Internet practitioner who insists on lifelong learning. Like my blog, welcome to pay attention to me on csdn. If you have any questions, please communicate in the comment area below. Thank you.

Article catalog


Rasa is an AI framework based on python, which can generate training model for user provided data to achieve the purpose of intelligent voice assistant. For more information, see Official website.

General installation

In Python 3.6 or 3.7, you can install pip directly

pip3 install rasa

Binary executables are then automatically created for command line operations

Command line operations

The command line provides most of the operation functions, which can be completed from installation to training to service. The following common commands will be used in the following operations

command explain
rasa init Create a new project with data for demonstration
rasa train According to the provided NLU data training model, the results are saved in. / models
rasa run Start a rasa server according to the training model, listen to port 5005 by default, modifiable
rasa -h View help documents

More command line operations references here

API operations

After running rasa run, you can perform HTTP API operations through the port monitored by rasa server, such as checking server status, transferring data for training, and interacting with the trained server. All API information can be viewed here.

What needs to be specially pointed out here is the judgment of server status, as follows


If the server returns 200 normally, this can be used to specify the survival detection of the container in k8s.

Deploy to k8s

Let's start the deployment. My deployment environment is in blog Building Kubernetes cluster from zero to container (Nvidia version) The cluster completed in can allocate GPU resources for this article's container.

Prepare to customize docker image

There is a ready-made image rasa/rasa on dockerhub, but according to the pit encountered during deployment, there is a place to change. For security reasons, rasa/rasa Dockerfile Set the user in the container to 1001, as follows

USER 1001

If Dockerhub cannot be accessed, please pay attention to scientific Internet access

In this way, although it is safe, there may be permission problems during deployment, so we need to generate a new image and change the user to root. At the same time, by the way, change the container run command from rasa --help to rasa init, as follows

FROM rasa/rasa:1.10.1-full
USER root
CMD ["init"]

Generate image after

docker build -t myrasa:v2 .

Use this image to deploy to k8s later


Create the yaml file for deployment as follows

apiVersion: extensions/v1beta1
kind: Deployment
  name: rasa-deployment
  replicas: 1
        app: rasa
        - name: myrasa
          image: myrasa:v2
            - containerPort: 5005
              hostPort: 5005
              protocol: TCP
          command: ["/bin/sh","-c","rasa init --no-prompt;rasa run --enable-api"]
              port: 5005
              path: /
            initialDelaySeconds: 1
            periodSeconds: 3
            - mountPath: /app
              name: rasa-app
            - mountPath: /tmp
              name: rasa-tmp
        - name: rasa-app
          emptyDir: {}
        - name: rasa-tmp
          emptyDir: {}

Here are some parts to pay attention to

  • livenessProbe

If you do not specify a liveness probe, you will determine whether the container is alive through whether the command in the command is completed. This is not applicable here, otherwise the container will be restarted all the time. Here is the survival detection through httpGet. That is to say, if port 5005 of the container returns 200, it means that the container is alive all the time, and the pod status is Running. For more information on survival testing, please refer to Probe probe in the life cycle of [Kubernetes 006] container

  • command

The command here will overwrite the ENTRYPOINT in the Dockerfile. There are two commands executed in total. rasa init will create a new project. In this process, interaction is required, so use -- no prompt to automatically select, or an error will be reported. rasa run will start a rasa server, and listen to the default 5005 port

If you don't use the / bin/sh -c method, the command will not be executed in the shell, and it will fail

  • volume

Here is the basis Dockerfile The / app is the working directory where the main training model is stored. For simplicity, emptyDir is used here. If you want to operate directly from the host, you can use hostPath. However, if there are multiple hosts, you should pay attention to the consistency of paths. To learn more about volume in k8s, please refer to [Kubernetes 013] Volume principle and practical operation details

  • resources

The aliyun plug-in used here is to deploy the cluster and allocate 1GiB GPU for each container. Because there is only one replica, the total consumption is 1GiB

Start deployment

kubectl apply -f rasa-deployment.yaml

Check the status after success

root@control-plane-1:~# kubectl get pod -o wide
NAME                                 READY   STATUS    RESTARTS   AGE   IP           NODE       NOMINATED NODE   READINESS GATES
curl-6bf6db5c4f-fr2xv                1/1     Running   0          28h   gpu-node   <none>           <none>
nvidia-deployment-5f4bbd9457-2rj7r   1/1     Running   1          28h   gpu-node   <none>           <none>
nvidia-deployment-5f4bbd9457-hfxqh   1/1     Running   1          28h   gpu-node   <none>           <none>
rasa-deployment-586ff4666-hf6xm      1/1     Running   0          26h   gpu-node   <none>           <none>

At this time, you can use the pod ip to perform API operations on port 5005, for example

root@control-plane-1:~# curl
Hello from Rasa: 1.10.1

At the same time, check the GPU allocation

root@control-plane-1:~# kubectl-inspect-gpushare -d

NAME:       gpu-node

NAME                                NAMESPACE  GPU0(Allocated)  
nvidia-deployment-5f4bbd9457-2rj7r  default    1                
nvidia-deployment-5f4bbd9457-hfxqh  default    1                
rasa-deployment-586ff4666-hf6xm     default    1                
Allocated :                         3 (42%)    
Total :                             7          

Allocated/Total GPU Memory In Cluster:  3/7 (42%)  

Add in the previous two containers for testing and allocate a total of 3GiB GPU, which is in line with the expectation

However, because there is no learning task to run, the actual consumption of GPU is not much

root@gpu-node:/home/ubuntu/rasa# nvidia-smi
Thu May 21 17:57:48 2020       
| NVIDIA-SMI 435.21       Driver Version: 435.21       CUDA Version: 10.1     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  GeForce RTX 207...  Off  | 00000000:01:00.0 Off |                  N/A |
| 24%   27C    P8    10W / 215W |     19MiB /  7982MiB |      0%      Default |

| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|    0      1765      G   /usr/lib/xorg/Xorg                             9MiB |
|    0      1799      G   /usr/bin/gnome-shell                           8MiB |


At the end of the day, users outside should be able to access it, so you need to expose the port. This is relatively simple. Just go to nodePort

apiVersion: v1
kind: Service
  name: rasa-service
  type: NodePort
    app: rasa
    - name: http
      port: 5005
      targetPort: 5005
      nodePort: 31111

Note that the nodePort port must be greater than 30000 if specified, otherwise it can be omitted to let the system automatically assign a port.


kubectl apply -f rasa-service.yaml

Check it out

root@control-plane-1:~# kubectl get svc
NAME           TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
kubernetes     ClusterIP      <none>        443/TCP          2d
rasa-service   NodePort   <none>        5005:31111/TCP   20h

After that, you can access the API through the node's Internet ip

[root@ai-asterisk ~]# curl
Hello from Rasa: 1.10.1

API operations

The use of API is not the focus of this blog, but there is an API that must be mentioned, that is, a Interactive API , API format is as follows

POST /webhooks/rest/webhook

The json data format of POST is as follows

  "sender": "Rasa",
  "message": "Hi there!"

sender can be specified at will. message is to imitate what the user said.

for example

[root@ai-asterisk ~]# curl -X POST -d '{"sender":"xiaofu","message":"Hi there."}'
[{"recipient_id":"xiaofu","text":"Hey! How are you?"}]
[root@ai-asterisk ~]# curl -X POST -d '{"sender":"xiaofu","message":"What is your name?"}'
[{"recipient_id":"xiaofu","text":"I am a bot, powered by Rasa."}]
[root@ai-asterisk ~]# curl -X POST -d '{"sender":"xiaofu","message":"I love you."}'
[{"recipient_id":"xiaofu","text":"Great, carry on!"}]


In this way, rasa is successfully deployed to k8s cluster. But this is far from taking advantage of the advantages of k8s. If we can combine the API of k8s to automatically generate pod for users to train the model, it will be more intelligent. We will discuss it later.

Tags: curl Kubernetes Asterisk REST

Posted on Thu, 21 May 2020 03:52:53 -0700 by moallam