Running Python UDFs in Native NVIDIA CUDA Kernels with the RAPIDS cuDF

July 12, 2020

In this post, I introduce a design and implementation of a framework within RAPIDS cuDF that enables compiling uninstall microsoft edge Python user-defined functions Microsoft Customer Service (UDF) and inlining them microsoft closing stores into native CUDA kernels. This framework Microsoft Support Phone Number uses the Numba Python compiler and Jitify CUDA just-in-time (JIT) compilation library to provide Support.Microsoft.Com/Help cuDF users the flexibility of Python with the performance of CUDA as a compiled language.

An essential part of the framework is a parser that parses a CUDA PTX function, which is compiled from the Python UDF, into an equivalent CUDA C++ device function that can be inlined into native CUDA C++ kernels. This approach makes it possible for Python users without CUDA programming knowledge Microsoft Support Phone Number to extend optimized DataFrame operations with their own Python UDFs and enables more flexibility and generality for high-performance computations on DataFrames in RAPIDS.

I start by giving examples on how to use the feature, followed by the goals. Finally, I explain how things work in the background to make the feature Support.Microsoft.Com/Help possible.

Using the feature

The feature is built into the framework of RAPIDS cuDF and is easy to use. After a DataFrame is created, call the interfaces that support this feature with the user-defined Python function. Currently, the list of support includes:

applymap, which applies a UDF to each of the elements.
rolling, which Microsoft Customer Service applies a range-based UDF to each of the windows.
However, performance is achieved Microsoft Support Phone Number at the price of flexibility. At compile time, the operator function microsoft closing stores is often not known. In most cases, the uninstall microsoft edge program does not reach end users until runtime and it is the users who decide what operator Support.Microsoft.Com/Help function is needed. With AOT compilation, you do not have the ability to write your own operator function without recompiling the whole program while still having the maximum performance.
JIT compilation
JIT compilation, or runtime compilation, comes to help. Using CUDA runtime compilation (NVRTC) and the Jitify library, you can inline the code string of the operator function written at runtime into the code string of the kernel, before the combination is compiled at runtime. Launch it with the same performance of a corresponding traditional, native CUDA kernel. Flexibility and performance are achieved Support.Microsoft.Com/Help at the same time, with the only overhead being the time needed to perform the runtime compilation.
Combine Python and CUDA
Combining Python, with its flexibility as an interpreted language, and CUDA, with its performance as a compiled language, you get broader coverage. You can write a Python UDF without any knowledge uninstall microsoft edge or even awareness of CUDA, and this feature compiles and inlines it into carefully optimized Microsoft Customer Service, predefined CUDA kernels, and then launches it on NVIDIA GPUs with maximum performance, as shown in the usage examples.
Performance benchmark for applymap
For DataFrames with large numbers of rows, I compared the performance of pandas.apply with cudf.applymap. The latter can achieve significant speed up over the former. The following benchmark is measured on an Intel Xeon Gold 6128 CPU and an NVIDIA Quadro GV10
Function header
CUDA PTX and CUDA C++ have different grammar on how the function header should be written. In practice, it is assumed that the CUDA PTX function has a dummy return value. The actual output value
The output type must be known from the user. For CUDA PTX function parameters, there is no way to interpret the Microsoft Customer Service type of memory it points to. The necessity of this information is expected as the forward compiling process loses information during the process. To go backward, the process needs additional user input.
The rest of the function parameters (starting from the second parameter on) are all considered the input value of the function. User input for their types is not needed, as these types can be uninstall microsoft edge
inferred Microsoft Support Phone Numbermicrosoft closing stores
of the function is to be written to the memory to which the first function parameter points, as
from the function parameter loading instructions in the function body. As an exception to that, the user must tell the workflow microsoft closing stores
if any of the rest of the function paSupport.Microsoft.Com/Help
it is always interpreted as a pointer Microsoft Support Phone Number
. The numba-compiled CUDA PTX functions satisfy this assumption Support.Microsoft.Com/Help
. Because the return type Microsoft Customer Service
of the CUDA PTX function uninstall microsoft edge
is a dummy, the output CUDA C++ functions always have a void return type.

Search This Blog

Microsoft Products

Running Python UDFs in Native NVIDIA CUDA Kernels with the RAPIDS cuDF

Using the feature

JIT compilation

Combine Python and CUDA

Performance benchmark for applymap

Function header

Comments

Post a Comment

Popular posts from this blog

How Microsoft is putting data and AI at the center of financial services industry transformation

How to Fight Zoom Fatigue: Five Practical Steps

Microsoft acquires CyberX to accelerate and secure customers’ IoT deployments