Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 24 additions & 2 deletions GPU-Virtual-Service/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,15 +114,17 @@ kubectl apply -f volcano-development.yaml
- containerd:

```Bash
ctr -n=k8s.io i import gpu_device_plugin.tar
ctr -n=k8s.io i import gpu_device_plugin.tar
ctr -n=k8s.io i import cuda_client_update.tar
ctr -n=k8s.io i import xpu-exporter.tar
```

- docker:

```Bash
docker load -i gpu_device_plugin.tar
docker load -i gpu_device_plugin.tar
docker load -i cuda_client_update.tar
docker load -i xpu-exporter.tar
```

创建xpu命名空间
Expand Down Expand Up @@ -452,6 +454,18 @@ cd {filepath}/GPU-Virtual-Service/xpu-pool-service/GPU-device-plugin && go mod t
其中:{filepath} 应被替换为flexai本地代码的路径
编译生成文件:`gpu-device-plugin`、`xpu-client-tool`。

#### xpu-exporter

go的版本为1.22.1,建议保持一致:

```Bash
export CGO_ENABLED=0
cd {filepath}/GPU-Virtual-Service/xpu-pool-service/xpu-exporter && go mod tidy && go build -o xpu-exporter ./cmd/xpu-exporter
```

其中:{filepath} 应被替换为flexai本地代码的路径
编译生成文件:`xpu-exporter`。

#### 调度组件

调度组件的编译后文件可以在`lib/`文件夹中找到。
Expand All @@ -476,6 +490,7 @@ chmod +x {filepath}/GPU-Virtual-Service/xpu-pool-service/client_update/cuda-clie

```Bash
cp -rf {filepath}/GPU-Virtual-Service/xpu-pool-service/GPU-device-plugin/gpu-device-plugin docker-build/gpu-device-plugin
cp -rf {filepath}/GPU-Virtual-Service/xpu-pool-service/xpu-exporter/xpu-exporter docker-build/xpu-exporter
```

通过以下链接下载os基础镜像,然后再部署
Expand All @@ -497,13 +512,20 @@ docker build -t cuda_client_update:2.0 .
docker build -t gpu_device_plugin:2.0 .
```

在`docker-build/xpu-exporter`目录下执行:

```Bash
docker build -t xpu-exporter:2.0 .
```

(上述代码的`.`不能忽略)

至此,镜像制作完成,可以使用如下命令将镜像保存到本地:

```Bash
docker save -o gpu_device_plugin.tar gpu_device_plugin:2.0
docker save -o cuda_client_update.tar cuda_client_update:2.0
docker save -o xpu-exporter.tar xpu-exporter:2.0
```

volcano调度的打包流程与上述相仿,将编译产物复制到对应的`docker-build/`所在的文件夹下:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -197,13 +197,13 @@ func updateVgpuDeviceInfo(ch chan<- prometheus.Metric, gpu *utils.XPUDevice) {
vgpuPodMap := make(map[string]int)
for _, vgpu := range gpu.VxpuDeviceList {
ch <- prometheus.MustNewConstMetric(xpuVgpuUtilizationDesc, prometheus.GaugeValue,
vgpu.VxpuCoreUtilization, []string{gpu.Id, gpu.NodeName, gpu.NodeIp, vgpu.PodUID,
vgpu.ContainerName, vgpu.Id, strconv.Itoa(int(vgpu.VxpuCoreLimit)),
strconv.Itoa(int(vgpu.VxpuMemoryLimit))})
vgpu.VxpuCoreUtilization, gpu.Id, gpu.NodeName, gpu.NodeIp, vgpu.PodUID,
vgpu.ContainerName, vgpu.Id, strconv.Itoa(int(vgpu.VxpuCoreLimit)),
strconv.Itoa(int(vgpu.VxpuMemoryLimit)))
ch <- prometheus.MustNewConstMetric(xpuVgpuMemoryUtilizationDesc, prometheus.GaugeValue,
vgpu.VxpuMemoryUtilization, []string{gpu.Id, gpu.NodeName, gpu.NodeIp, vgpu.PodUID,
vgpu.ContainerName, vgpu.Id, strconv.Itoa(int(vgpu.VxpuCoreLimit)),
strconv.Itoa(int(vgpu.VxpuMemoryLimit))})
vgpu.VxpuMemoryUtilization, gpu.Id, gpu.NodeName, gpu.NodeIp, vgpu.PodUID,
vgpu.ContainerName, vgpu.Id, strconv.Itoa(int(vgpu.VxpuCoreLimit)),
strconv.Itoa(int(vgpu.VxpuMemoryLimit)))
if _, ok := vgpuPodMap[vgpu.PodUID]; !ok {
vgpuPodNumber += 1
vgpuPodMap[vgpu.PodUID] = vgpuPodNumber
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,11 @@ import (
"context"
"errors"
"fmt"
"math"
"net/http"
"regexp"
"strconv"
"strings"
"sync"
"syscall"
"time"

"huawei.com/xpu-exporter/common/cache"
Expand Down Expand Up @@ -108,7 +106,7 @@ func (h *limitHandler) ServeHTTP(w http.ResponseWriter, req *http.Request) {
req.Body = http.MaxBytesReader(w, req.Body, h.limitBytes)
ctx := initContext(req)
path := req.URL.Path
clientUserAgent := req.UserAgent()
_ = req.UserAgent() // avoid unused variable error
clientIP := utils.ClientIP(req)

// Check if the IP has exceeded the limit of 20 requests per minute
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions GPU-Virtual-Service/xpu-pool-service/xpu-exporter/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -24,14 +24,14 @@ require (
github.com/kr/text v0.2.0 // indirect
github.com/matttproud/golang_protobuf_extensions v1.0.4 // indirect
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect
github.com/prometheus/client_model v0.3.0 // indirect
github.com/prometheus/common v0.42.0 // indirect
github.com/prometheus/procfs v0.10.1 // indirect
github.com/prometheus/client_model v0.3.0 // indirect
github.com/rogpeppe/go-internal v1.12.0 // indirect
github.com/sirupsen/logrus v1.8.2 // indirect
golang.org/x/net v0.26.0 // indirect
golang.org/x/sys v0.21.0 // indirect
golang.org/x/text v0.16.0 // indirect
golang.org/x/net v0.26.0 // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20240701130421-f6361c86f094 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
)
Loading