Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
90f4118
Renderer: Add prepareForDraw callback
wheremyfoodat Jul 24, 2024
a2b8a7b
Add fmt submodule and port shader decompiler instructions to it
wheremyfoodat Jul 24, 2024
251ff5e
Add shader acceleration setting
wheremyfoodat Jul 24, 2024
2f4c169
Hook up vertex shaders to shader cache
wheremyfoodat Jul 25, 2024
69accde
Merge branch 'master' into shader-decomp
wheremyfoodat Jul 25, 2024
efcb42a
Shader decompiler: Fix redundant compilations
wheremyfoodat Jul 25, 2024
d9f4f37
Shader Decompiler: Fix vertex attribute upload
wheremyfoodat Jul 25, 2024
2fc0922
Shader compiler: Simplify generated code for reading and faster compi…
wheremyfoodat Jul 25, 2024
2131838
Further simplify shader decompiler output
wheremyfoodat Jul 25, 2024
e8b4992
Shader decompiler: More smallen-ing
wheremyfoodat Jul 25, 2024
fd90cf7
Merge branch 'master' into shader-decomp
wheremyfoodat Jul 26, 2024
67ff1cc
Shader decompiler: Get PICA uniforms uploaded to the GPU
wheremyfoodat Jul 26, 2024
db64b0a
Shader decompiler: Readd clipping
wheremyfoodat Jul 26, 2024
9274a95
Merge branch 'master' into shader-decomp
wheremyfoodat Jul 26, 2024
67daf03
Shader decompiler: Actually `break` on control flow instructions
wheremyfoodat Jul 26, 2024
ff3afd4
Merge branch 'master' into shader-decomp
wheremyfoodat Jul 26, 2024
5eb15de
Shader decompiler: More control flow handling
wheremyfoodat Jul 26, 2024
a20982f
Shader decompiler: Fix desitnation mask
wheremyfoodat Jul 26, 2024
4470550
Shader Decomp: Remove pair member capture in lambda (unsupported on NDK)
wheremyfoodat Jul 27, 2024
37d7bad
Disgusting changes to handle the fact that hw shader shaders are 2x a…
wheremyfoodat Jul 28, 2024
9ee1c39
Shader decompiler: Implement proper output semantic mapping
wheremyfoodat Jul 28, 2024
6c738e8
Moar instructions
wheremyfoodat Jul 28, 2024
d125180
Shader decompiler: Add FLR/SLT/SLTI/SGE/SGEI
wheremyfoodat Jul 28, 2024
b3f35d8
Merge branch 'master' into shader-decomp
wheremyfoodat Jul 28, 2024
4040d88
Shader decompiler: Add register indexing
wheremyfoodat Jul 28, 2024
94bd060
Shader decompiler: Optimize mova with both x and y masked
wheremyfoodat Jul 28, 2024
59f4f23
Shader decompiler: Add DPH/DPHI
wheremyfoodat Jul 28, 2024
7209740
Fix shader caching being broken
wheremyfoodat Jul 28, 2024
0d6bef2
PICA decompiler: Cache VS uniforms
wheremyfoodat Jul 28, 2024
1c9df7c
Simply vertex cache code
wheremyfoodat Jul 28, 2024
53ee3f3
Simplify vertex cache code
wheremyfoodat Jul 28, 2024
b53df87
Merge branch 'shader-decomp' of https://github.com/wheremyfoodat/Pand…
wheremyfoodat Jul 28, 2024
ffcf352
Merge branch 'master' into shader-decomp
wheremyfoodat Jul 31, 2024
b46f7ad
Shader decompiler: Add loops
wheremyfoodat Jul 31, 2024
370aa8e
Merge branch 'master' into shader-decomp
wheremyfoodat Aug 7, 2024
c7371e3
Shader decompiler: Implement safe multiplication
wheremyfoodat Aug 7, 2024
1366e7a
Merge branch 'master' into shader-decomp
wheremyfoodat Aug 19, 2024
7e04ab7
Shader decompiler: Implement LG2/EX2
wheremyfoodat Aug 19, 2024
e481ce8
Shader decompiler: More control flow
wheremyfoodat Aug 19, 2024
943cf9b
Shader decompiler: Fix JMPU condition
wheremyfoodat Aug 19, 2024
30a6514
Merge branch 'master' into shader-decomp
wheremyfoodat Aug 20, 2024
73a5d44
Merge branch 'master' into shader-decomp
wheremyfoodat Aug 20, 2024
652b600
Shader decompiler: Convert main function to void
wheremyfoodat Aug 20, 2024
e13ef42
PICA: Start implementing GPU vertex fetch
wheremyfoodat Aug 20, 2024
cf31f7b
Merge branch 'master' into shader-decomp
wheremyfoodat Aug 23, 2024
74a341b
More hw VAO work
wheremyfoodat Aug 23, 2024
5d6f591
More hw VAO work
wheremyfoodat Aug 23, 2024
349de65
Merge branch 'shader-decomp' of https://github.com/wheremyfoodat/Pand…
wheremyfoodat Aug 24, 2024
a8b30ee
More GPU vertex fetch code
wheremyfoodat Aug 24, 2024
e34bdb6
Add GL Stream Buffer from Duckstation
wheremyfoodat Aug 24, 2024
f96b609
GL: Actually upload data to stream buffers
wheremyfoodat Aug 25, 2024
33e63f7
GPU: Cleanup immediate mode handling
wheremyfoodat Aug 25, 2024
5432a5a
Get first renders working with accelerated draws
wheremyfoodat Aug 25, 2024
e925a91
Shader decompiler: Fix control flow analysis bugs
wheremyfoodat Aug 25, 2024
37a43e2
HW shaders: Accelerate indexed draws
wheremyfoodat Aug 25, 2024
ca2d7e4
Shader decompiler: Add support for compilation errors
wheremyfoodat Aug 25, 2024
0c2ae1b
GLSL decompiler: Fall back for LITP
wheremyfoodat Aug 25, 2024
0e7697d
Add Renderdoc scope classes
wheremyfoodat Aug 25, 2024
e332ab2
Fix control flow analysis bug
wheremyfoodat Sep 2, 2024
15b6a9e
HW shaders: Fix attribute fetch
wheremyfoodat Sep 2, 2024
4a39b06
Rewriting hw vertex fetch
wheremyfoodat Sep 4, 2024
1642537
Stream buffer: Fix copy-paste mistake
wheremyfoodat Oct 4, 2024
09b0470
HW shaders: Fix indexed rendering
wheremyfoodat Oct 4, 2024
0a2bc7c
HW shaders: Add padding attributes
wheremyfoodat Oct 4, 2024
e3252ec
HW shaders: Avoid redundant glVertexAttrib4f calls
wheremyfoodat Oct 5, 2024
872a6ba
HW shaders: Fix loops
wheremyfoodat Oct 6, 2024
2b82f8b
Merge branch 'master' into shader-decomp
wheremyfoodat Oct 6, 2024
bb7b1b3
HW shaders: Make generated shaders slightly smaller
wheremyfoodat Oct 6, 2024
53097cc
Fix libretro build
wheremyfoodat Oct 6, 2024
b833e07
Update config.hpp
wheremyfoodat Oct 6, 2024
12d0810
Update renderer_gl.cpp
wheremyfoodat Oct 6, 2024
56c3e73
Add Android logging when a shader fails to compile
wheremyfoodat Oct 6, 2024
40a7ac6
Update opengl.hpp
wheremyfoodat Oct 6, 2024
5202d91
Shader Decompiler: Add int/float precision qualifiers.
wheremyfoodat Oct 6, 2024
214c1d8
Update renderer_gl.cpp
wheremyfoodat Oct 6, 2024
0252a2a
Update shader_decompiler.cpp
wheremyfoodat Oct 6, 2024
aa18129
Update opengl.hpp
wheremyfoodat Oct 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,9 @@
[submodule "third_party/metal-cpp"]
path = third_party/metal-cpp
url = https://github.com/Panda3DS-emu/metal-cpp
[submodule "third_party/fmt"]
path = third_party/fmt
url = https://github.com/fmtlib/fmt
[submodule "third_party/fdk-aac"]
path = third_party/fdk-aac
url = https://github.com/Panda3DS-emu/fdk-aac/
12 changes: 8 additions & 4 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -146,11 +146,13 @@ if (NOT ANDROID)
target_link_libraries(AlberCore PUBLIC SDL2-static)
endif()

add_subdirectory(third_party/fmt)
add_subdirectory(third_party/toml11)
include_directories(${SDL2_INCLUDE_DIR})
include_directories(third_party/toml11)
include_directories(third_party/glm)
include_directories(third_party/renderdoc)
include_directories(third_party/duckstation)

add_subdirectory(third_party/cmrc)

Expand Down Expand Up @@ -263,7 +265,7 @@ set(PICA_SOURCE_FILES src/core/PICA/gpu.cpp src/core/PICA/regs.cpp src/core/PICA
src/core/PICA/shader_interpreter.cpp src/core/PICA/dynapica/shader_rec.cpp
src/core/PICA/dynapica/shader_rec_emitter_x64.cpp src/core/PICA/pica_hash.cpp
src/core/PICA/dynapica/shader_rec_emitter_arm64.cpp src/core/PICA/shader_gen_glsl.cpp
src/core/PICA/shader_decompiler.cpp
src/core/PICA/shader_decompiler.cpp src/core/PICA/draw_acceleration.cpp
)

set(LOADER_SOURCE_FILES src/core/loader/elf.cpp src/core/loader/ncsd.cpp src/core/loader/ncch.cpp src/core/loader/3dsx.cpp src/core/loader/lz77.cpp)
Expand Down Expand Up @@ -315,7 +317,8 @@ set(HEADER_FILES include/emulator.hpp include/helpers.hpp include/termcolor.hpp
include/audio/miniaudio_device.hpp include/ring_buffer.hpp include/bitfield.hpp include/audio/dsp_shared_mem.hpp
include/audio/hle_core.hpp include/capstone.hpp include/audio/aac.hpp include/PICA/pica_frag_config.hpp
include/PICA/pica_frag_uniforms.hpp include/PICA/shader_gen_types.hpp include/PICA/shader_decompiler.hpp
include/sdl_sensors.hpp include/renderdoc.hpp include/audio/aac_decoder.hpp
include/PICA/pica_vert_config.hpp include/sdl_sensors.hpp include/PICA/draw_acceleration.hpp include/renderdoc.hpp
include/align.hpp include/audio/aac_decoder.hpp
)

cmrc_add_resource_library(
Expand Down Expand Up @@ -348,7 +351,6 @@ if(ENABLE_LUAJIT AND NOT ANDROID)
endif()

if(ENABLE_QT_GUI)
include_directories(third_party/duckstation)
set(THIRD_PARTY_SOURCE_FILES ${THIRD_PARTY_SOURCE_FILES} third_party/duckstation/window_info.cpp third_party/duckstation/gl/context.cpp)

if(APPLE)
Expand Down Expand Up @@ -391,6 +393,8 @@ if(ENABLE_OPENGL)
src/host_shaders/opengl_fragment_shader.frag
)

set(THIRD_PARTY_SOURCE_FILES ${THIRD_PARTY_SOURCE_FILES} third_party/duckstation/gl/stream_buffer.cpp)

set(HEADER_FILES ${HEADER_FILES} ${RENDERER_GL_INCLUDE_FILES})
source_group("Source Files\\Core\\OpenGL Renderer" FILES ${RENDERER_GL_SOURCE_FILES})

Expand Down Expand Up @@ -480,7 +484,7 @@ set(ALL_SOURCES ${SOURCE_FILES} ${FS_SOURCE_FILES} ${CRYPTO_SOURCE_FILES} ${KERN
target_sources(AlberCore PRIVATE ${ALL_SOURCES})

target_link_libraries(AlberCore PRIVATE dynarmic cryptopp glad resources_console_fonts teakra fdk-aac)
target_link_libraries(AlberCore PUBLIC glad capstone)
target_link_libraries(AlberCore PUBLIC glad capstone fmt::fmt)

if(ENABLE_DISCORD_RPC AND NOT ANDROID)
target_compile_definitions(AlberCore PUBLIC "PANDA3DS_ENABLE_DISCORD_RPC=1")
Expand Down
45 changes: 45 additions & 0 deletions include/PICA/draw_acceleration.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#pragma once

#include <array>

#include "helpers.hpp"

namespace PICA {
struct DrawAcceleration {
static constexpr u32 maxAttribCount = 16;
static constexpr u32 maxLoaderCount = 12;

struct AttributeInfo {
u32 offset;
u32 stride;

u8 type;
u8 componentCount;

std::array<float, 4> fixedValue; // For fixed attributes
};

struct Loader {
// Data to upload for this loader
u8* data;
usize size;
};

u8* indexBuffer;

// Minimum and maximum index in the index buffer for a draw call
u16 minimumIndex, maximumIndex;
u32 totalAttribCount;
u32 totalLoaderCount;
u32 enabledAttributeMask;
u32 fixedAttributes;
u32 vertexDataSize;

std::array<AttributeInfo, maxAttribCount> attributeInfo;
std::array<Loader, maxLoaderCount> loaders;

bool canBeAccelerated;
bool indexed;
bool useShortIndices;
};
} // namespace PICA
10 changes: 9 additions & 1 deletion include/PICA/gpu.hpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#pragma once
#include <array>

#include "PICA/draw_acceleration.hpp"
#include "PICA/dynapica/shader_rec.hpp"
#include "PICA/float_types.hpp"
#include "PICA/pica_vertex.hpp"
Expand All @@ -13,6 +14,12 @@
#include "memory.hpp"
#include "renderer.hpp"

enum class ShaderExecMode {
Interpreter, // Interpret shaders on the CPU
JIT, // Recompile shaders to CPU machine code
Hardware, // Recompiler shaders to host shaders and run them on the GPU
};

class GPU {
static constexpr u32 regNum = 0x300;
static constexpr u32 extRegNum = 0x1000;
Expand Down Expand Up @@ -45,7 +52,7 @@ class GPU {
uint immediateModeVertIndex;
uint immediateModeAttrIndex; // Index of the immediate mode attribute we're uploading

template <bool indexed, bool useShaderJIT>
template <bool indexed, ShaderExecMode mode>
void drawArrays();

// Silly method of avoiding linking problems. TODO: Change to something less silly
Expand Down Expand Up @@ -81,6 +88,7 @@ class GPU {
std::unique_ptr<Renderer> renderer;
PICA::Vertex getImmediateModeVertex();

void getAcceleratedDrawInfo(PICA::DrawAcceleration& accel, bool indexed);
public:
// 256 entries per LUT with each LUT as its own row forming a 2D image 256 * LUT_COUNT
// Encoded in PICA native format
Expand Down
57 changes: 57 additions & 0 deletions include/PICA/pica_vert_config.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
#pragma once
#include <array>
#include <cassert>
#include <cstring>
#include <type_traits>
#include <unordered_map>

#include "PICA/pica_hash.hpp"
#include "PICA/regs.hpp"
#include "PICA/shader.hpp"
#include "bitfield.hpp"
#include "helpers.hpp"

namespace PICA {
// Configuration struct used
struct VertConfig {
PICAHash::HashType shaderHash;
PICAHash::HashType opdescHash;
u32 entrypoint;

// PICA registers for configuring shader output->fragment semantic mapping
std::array<u32, 7> outmaps{};
u16 outputMask;
u8 outputCount;
bool usingUbershader;

// Pad to 56 bytes so that the compiler won't insert unnecessary padding, which in turn will affect our unordered_map lookup
// As the padding will get hashed and memcmp'd...
u32 pad{};

bool operator==(const VertConfig& config) const {
// Hash function and equality operator required by std::unordered_map
return std::memcmp(this, &config, sizeof(VertConfig)) == 0;
}

VertConfig(PICAShader& shader, const std::array<u32, 0x300>& regs, bool usingUbershader) : usingUbershader(usingUbershader) {
shaderHash = shader.getCodeHash();
opdescHash = shader.getOpdescHash();
entrypoint = shader.entrypoint;

outputCount = regs[PICA::InternalRegs::ShaderOutputCount] & 7;
outputMask = regs[PICA::InternalRegs::VertexShaderOutputMask];
for (int i = 0; i < outputCount; i++) {
// Mask out unused bits
outmaps[i] = regs[PICA::InternalRegs::ShaderOutmap0 + i] & 0x1F1F1F1F;
}
}
};
} // namespace PICA

static_assert(sizeof(PICA::VertConfig) == 56);

// Override std::hash for our vertex config class
template <>
struct std::hash<PICA::VertConfig> {
std::size_t operator()(const PICA::VertConfig& config) const noexcept { return PICAHash::computeHash((const char*)&config, sizeof(config)); }
};
25 changes: 20 additions & 5 deletions include/PICA/shader.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,11 @@ class PICAShader {
alignas(16) std::array<vec4f, 16> inputs; // Attributes passed to the shader
alignas(16) std::array<vec4f, 16> outputs;
alignas(16) vec4f dummy = vec4f({f24::zero(), f24::zero(), f24::zero(), f24::zero()}); // Dummy register used by the JIT

// We use a hashmap for matching 3DS shaders to their equivalent compiled code in our shader cache in the shader JIT
// We choose our hash type to be a 64-bit integer by default, as the collision chance is very tiny and generating it is decently optimal
// Ideally we want to be able to support multiple different types of hash depending on compilation settings, but let's get this working first
using Hash = PICAHash::HashType;

protected:
std::array<u32, 128> operandDescriptors;
Expand All @@ -125,14 +130,13 @@ class PICAShader {
std::array<CallInfo, 4> callInfo;
ShaderType type;

// We use a hashmap for matching 3DS shaders to their equivalent compiled code in our shader cache in the shader JIT
// We choose our hash type to be a 64-bit integer by default, as the collision chance is very tiny and generating it is decently optimal
// Ideally we want to be able to support multiple different types of hash depending on compilation settings, but let's get this working first
using Hash = PICAHash::HashType;

Hash lastCodeHash = 0; // Last hash computed for the shader code (Used for the JIT caching mechanism)
Hash lastOpdescHash = 0; // Last hash computed for the operand descriptors (Also used for the JIT)

public:
bool uniformsDirty = false;

protected:
bool codeHashDirty = false;
bool opdescHashDirty = false;

Expand Down Expand Up @@ -284,6 +288,7 @@ class PICAShader {
uniform[2] = f24::fromRaw(((floatUniformBuffer[0] & 0xff) << 16) | (floatUniformBuffer[1] >> 16));
uniform[3] = f24::fromRaw(floatUniformBuffer[0] >> 8);
}
uniformsDirty = true;
}
}

Expand All @@ -295,13 +300,23 @@ class PICAShader {
u[1] = getBits<8, 8>(word);
u[2] = getBits<16, 8>(word);
u[3] = getBits<24, 8>(word);
uniformsDirty = true;
}

void uploadBoolUniform(u32 value) {
boolUniform = value;
uniformsDirty = true;
}

void run();
void reset();

Hash getCodeHash();
Hash getOpdescHash();

// Returns how big the PICA uniforms are combined. Used for hw accelerated shaders where we upload the uniforms to our GPU.
static constexpr usize totalUniformSize() { return sizeof(floatUniforms) + sizeof(intUniforms) + sizeof(boolUniform); }
void* getUniformPointer() { return static_cast<void*>(&floatUniforms); }
};

static_assert(
Expand Down
19 changes: 14 additions & 5 deletions include/PICA/shader_decompiler.hpp
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
#pragma once
#include <fmt/format.h>

#include <map>
#include <set>
#include <string>
#include <tuple>
#include <map>
#include <utility>
#include <vector>

#include "PICA/shader.hpp"
Expand Down Expand Up @@ -41,9 +44,12 @@ namespace PICA::ShaderGen {
explicit Function(u32 start, u32 end) : start(start), end(end) {}
bool operator<(const Function& other) const { return AddressRange(start, end) < AddressRange(other.start, other.end); }

std::string getIdentifier() const { return "func_" + std::to_string(start) + "_to_" + std::to_string(end); }
std::string getForwardDecl() const { return "void " + getIdentifier() + "();\n"; }
std::string getCallStatement() const { return getIdentifier() + "()"; }
std::string getIdentifier() const { return fmt::format("fn_{}_{}", start, end); }
// To handle weird control flow, we have to return from each function a bool that indicates whether or not the shader reached an end
// instruction and should thus terminate. This is necessary for games like Rayman and Gravity Falls, which have "END" instructions called
// from within functions deep in the callstack
std::string getForwardDecl() const { return fmt::format("bool fn_{}_{}();\n", start, end); }
std::string getCallStatement() const { return fmt::format("fn_{}_{}()", start, end); }
};

std::set<Function> functions{};
Expand Down Expand Up @@ -93,9 +99,11 @@ namespace PICA::ShaderGen {

API api;
Language language;
bool compilationError = false;

void compileInstruction(u32& pc, bool& finished);
void compileRange(const AddressRange& range);
// Compile range "range" and returns the end PC or if we're "finished" with the program (called an END instruction)
std::pair<u32, bool> compileRange(const AddressRange& range);
void callFunction(const Function& function);
const Function* findFunction(const AddressRange& range);

Expand All @@ -105,6 +113,7 @@ namespace PICA::ShaderGen {
std::string getDest(u32 dest) const;
std::string getSwizzlePattern(u32 swizzle) const;
std::string getDestSwizzle(u32 destinationMask) const;
const char* getCondition(u32 cond, u32 refX, u32 refY);

void setDest(u32 operandDescriptor, const std::string& dest, const std::string& value);
// Returns if the instruction uses the typical register encodings most instructions use
Expand Down
3 changes: 3 additions & 0 deletions include/PICA/shader_gen.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

#include "PICA/gpu.hpp"
#include "PICA/pica_frag_config.hpp"
#include "PICA/pica_vert_config.hpp"
#include "PICA/regs.hpp"
#include "PICA/shader_gen_types.hpp"
#include "helpers.hpp"
Expand Down Expand Up @@ -30,6 +31,8 @@ namespace PICA::ShaderGen {
FragmentGenerator(API api, Language language) : api(api), language(language) {}
std::string generate(const PICA::FragmentConfig& config);
std::string getDefaultVertexShader();
// For when PICA shader is acceleration is enabled. Turn the PICA shader source into a proper vertex shader
std::string getVertexShaderAccelerated(const std::string& picaSource, const PICA::VertConfig& vertConfig, bool usingUbershader);

void setTarget(API api, Language language) {
this->api = api;
Expand Down
7 changes: 3 additions & 4 deletions include/PICA/shader_unit.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,9 @@
#include "PICA/shader.hpp"

class ShaderUnit {

public:
PICAShader vs; // Vertex shader
PICAShader gs; // Geometry shader
public:
PICAShader vs; // Vertex shader
PICAShader gs; // Geometry shader

ShaderUnit() : vs(ShaderType::Vertex), gs(ShaderType::Geometry) {}
void reset();
Expand Down
Loading