bug: memory of position_encoding_table is not malloced correctly.

### Branch/Tag/Commit

main

### Docker Image Version

nvcr.io/nvidia/pytorch:22.12-py3

### GPU name

A10

### CUDA Driver

535.54.03

### Reproduced Steps

```shell
1. docker run -ti --gpus all --rm nvcr.io/nvidia/pytorch:22.12-py3 bash
2. git clone --recursive https://github.com/NVIDIA/FasterTransformer.git
3. cd FasterTransformer
4. mkdir build
5. cd build
6. cmake -DSM=86 -DCMAKE_BUILD_TYPE=Release ..
7. make -j14
8. CUDA_VISIBLE_DEVICES=0 ./satrn 1 1 8 64 2048 4022 3 100 576 512 0 0.0 0
```
Abnormal Phenomena：
   in https://github.com/NVIDIA/FasterTransformer/blob/df4a7534860137e060e18d2ebf019906120ea204/src/fastertransformer/kernels/decoding_kernels.cu#L137, step_offset is calculated with intervals of hidden_units, 
https://github.com/NVIDIA/FasterTransformer/blob/df4a7534860137e060e18d2ebf019906120ea204/src/fastertransformer/kernels/decoding_kernels.cu#L134

So I think https://github.com/NVIDIA/FasterTransformer/blob/df4a7534860137e060e18d2ebf019906120ea204/src/fastertransformer/models/decoding/DecodingWeight.h#L101 should be 
```cudaD2Dcpy(weights_ptr[0], other.weights_ptr[0], max_seq_len_ * hidden_units_); ```
instead of 
```cudaD2Dcpy(weights_ptr[0], other.weights_ptr[0], max_seq_len_ * vocab_size_);```

There are two similar situations
https://github.com/NVIDIA/FasterTransformer/blob/df4a7534860137e060e18d2ebf019906120ea204/src/fastertransformer/models/decoding/DecodingWeight.h#L77
https://github.com/NVIDIA/FasterTransformer/blob/df4a7534860137e060e18d2ebf019906120ea204/src/fastertransformer/models/decoding/DecodingWeight.h#L118

I have pull a pr to try to fix it. @byshiue 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: memory of position_encoding_table is not malloced correctly. #790

Branch/Tag/Commit

Docker Image Version

GPU name

CUDA Driver

Reproduced Steps

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug: memory of position_encoding_table is not malloced correctly. #790

Description

Branch/Tag/Commit

Docker Image Version

GPU name

CUDA Driver

Reproduced Steps

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions