ACCL

ACCL::ACCL

class ACCL

Main ACCL class that talks to the CCLO on hardware or emulation/simulation.

Public Functions

ACCL(xrt::device &device, xrt::ip &cclo_ip, xrt::kernel &hostctrl_ip, int devicemem, const std::vector<int> &rxbufmem, const arithConfigMap &arith_config = DEFAULT_ARITH_CONFIG)

Construct a new ACCL object that talks to hardware.

Parameters:
  • device – FPGA device on which the CCLO lives

  • cclo_ip – The CCLO kernel on the FPGA

  • hostctrl_ip – The hostctrl kernel on the FPGA

  • devicemem – Memory bank of device memory

  • rxbufmem – Memory banks of rxbuf memory

  • arith_config – Arithmetic configuration to use

ACCL(unsigned int start_port, unsigned int local_rank, const arithConfigMap &arith_config = DEFAULT_ARITH_CONFIG)

Construct a new ACCL object that talks to the ACCL emulator/simulator.

Parameters:
  • start_port – First port to use to connect to the ACCL emulator/ simulator

  • local_rank – Rank of this process

  • arith_config – Arithmetic configuration to use

ACCL(CoyoteDevice *dev, const arithConfigMap &arith_config = DEFAULT_ARITH_CONFIG)

Construct a new ACCL object on Coyote.

Parameters:
  • dev – Coyote device object

  • arith_config – Arithmetic configuration to use

~ACCL()

Destroy the ACCL object.

Automatically deinitializes the CCLO.

void soft_reset()

Performs a soft reset of the CCLO.

void deinit()

Deinitializes the CCLO.

void initialize(const std::vector<rank_t> &ranks, int local_rank, int n_egr_rx_bufs = 16, addr_t egr_rx_buf_size = 1024, addr_t max_egr_size = 1024, addr_t max_rndzv_size = 32 * 1024)

Initializes ACCL.

inline val_t get_retcode()

Get the return code of the last ACCL call.

Returns:

val_t The return code

inline val_t get_hwid()

Get the hardware id from the FPGA.

Returns:

val_t The hardware id

void parse_hwid()

Parse the hardware id from the FPGA.

ACCLRequest *set_timeout(unsigned int value, bool run_async = false, std::vector<ACCLRequest*> waitfor = {})

Set the timeout of ACCL calls.

Parameters:
  • value – Timeout in miliseconds

  • run_async – Run the ACCL call asynchronously.

  • waitforACCL call will wait for these operations before it will start. Currently not implemented.

Returns:

ACCLRequest* Request object used for waiting and checking for operation status

ACCLRequest *nop(bool run_async = false, std::vector<ACCLRequest*> waitfor = {})

Performs the nop operation on the FPGA.

Parameters:
  • run_async – Run the ACCL call asynchronously.

  • waitforACCL call will wait for these operations before it will start.

Returns:

ACCLRequest* Request object used for waiting and checking for operation status

ACCLRequest *send(BaseBuffer &srcbuf, unsigned int count, unsigned int dst, unsigned int tag = TAG_ANY, communicatorId comm_id = GLOBAL_COMM, bool from_fpga = false, dataType compress_dtype = dataType::none, bool run_async = false, std::vector<ACCLRequest*> waitfor = {})

Performs the send operation on the FPGA.

Parameters:
  • srcbufBuffer that contains the data to be send. Create a buffer using ACCL::create_buffer.

  • count – Amount of elements in buffer to send.

  • dst – Destination rank to send data to.

  • tag – Tag of send operation.

  • comm_id – Index of communicator to use.

  • from_fpga – Set to true if the data is already on the FPGA.

  • stream_flags – Stream flags to use.

  • compress_dtype – Datatype to compress buffers to over ethernet.

  • run_async – Run the ACCL call asynchronously.

  • waitforACCL call will wait for these operations before it will start. Currently not implemented.

Returns:

ACCLRequest* Request object used for waiting and checking for operation status

ACCLRequest *send(dataType src_data_type, unsigned int count, unsigned int dst, unsigned int tag = TAG_ANY, communicatorId comm_id = GLOBAL_COMM, dataType compress_dtype = dataType::none, bool run_async = false, std::vector<ACCLRequest*> waitfor = {})

Performs the send operation on the FPGA using the data stream of the CCLO as input.

Parameters:
  • src_data_type – Data type of the input.

  • count – Amount of elements in buffer to send.

  • dst – Destination rank to send data to.

  • tag – Tag of send operation.

  • comm_id – Index of communicator to use.

  • compress_dtype – Datatype to compress buffers to over ethernet.

  • run_async – Run the ACCL call asynchronously.

  • waitforACCL call will wait for these operations before it will start. Currently not implemented.

Returns:

ACCLRequest* Request object used for waiting and checking for operation status

ACCLRequest *stream_put(BaseBuffer &srcbuf, unsigned int count, unsigned int dst, unsigned int stream_id, communicatorId comm_id = GLOBAL_COMM, bool from_fpga = false, dataType compress_dtype = dataType::none, bool run_async = false, std::vector<ACCLRequest*> waitfor = {})

Performs a one-sided put to a stream on a remote FPGA.

Parameters:
  • srcbufBuffer that contains the data to be send. Create a buffer using ACCL::create_buffer.

  • count – Amount of elements in buffer to send.

  • dst – Destination rank to send data to.

  • stream_id – ID of target stream on destination rank. IDs 0-8 are reserved, throws exception if set in this range.

  • comm_id – Index of communicator to use.

  • from_fpga – Set to true if the data is already on the FPGA.

  • stream_flags – Stream flags to use. Note that only OP0_STREAM is relevant.

  • compress_dtype – Datatype to compress buffers to over ethernet.

  • run_async – Run the ACCL call asynchronously.

  • waitforACCL call will wait for these operations before it will start. Currently not implemented.

Returns:

ACCLRequest* Request object used for waiting and checking for operation status

ACCLRequest *stream_put(dataType src_data_type, unsigned int count, unsigned int dst, unsigned int stream_id, communicatorId comm_id = GLOBAL_COMM, dataType compress_dtype = dataType::none, bool run_async = false, std::vector<ACCLRequest*> waitfor = {})

Performs a one-sided put to a stream on a remote FPGA using the data stream of the CCLO as input.

Parameters:
  • src_data_type – Data type of the input.

  • count – Amount of elements in buffer to send.

  • dst – Destination rank to send data to.

  • stream_id – ID of target stream on destination rank. IDs 0-8 are reserved, throws exception if set in this range.

  • comm_id – Index of communicator to use.

  • compress_dtype – Datatype to compress buffers to over ethernet.

  • run_async – Run the ACCL call asynchronously.

  • waitforACCL call will wait for these operations before it will start. Currently not implemented.

Returns:

ACCLRequest* Request object used for waiting and checking for operation status

ACCLRequest *recv(BaseBuffer &dstbuf, unsigned int count, unsigned int src, unsigned int tag = TAG_ANY, communicatorId comm_id = GLOBAL_COMM, bool to_fpga = false, dataType compress_dtype = dataType::none, bool run_async = false, std::vector<ACCLRequest*> waitfor = {})

Performs the receive operation on the FPGA.

Parameters:
  • dstbufBuffer where the data should be stored to. Create a buffer using ACCL::create_buffer.

  • count – Amount of elements to receive.

  • src – Source rank to receive data from.

  • tag – Tag of receive operation.

  • comm_id – Index of communicator to use.

  • to_fpga – Set to true if the data will be used on the FPGA only.

  • compress_dtype – Datatype to compress buffers to over ethernet.

  • run_async – Run the ACCL call asynchronously.

  • waitforACCL call will wait for these operations before it will start. Currently not implemented.

Returns:

ACCLRequest* Request object used for waiting and checking for operation status

ACCLRequest *recv(dataType dst_data_type, unsigned int count, unsigned int src, unsigned int tag = TAG_ANY, communicatorId comm_id = GLOBAL_COMM, dataType compress_dtype = dataType::none, bool run_async = false, std::vector<ACCLRequest*> waitfor = {})

Performs the receive operation on the FPGA and directs the received data to the data stream of the CCLO.

Parameters:
  • dst_data_type – Data Type of the received data.

  • count – Amount of elements to receive.

  • src – Source rank to receive data from.

  • tag – Tag of receive operation.

  • comm_id – Index of communicator to use.

  • compress_dtype – Datatype to compress buffers to over ethernet.

  • run_async – Run the ACCL call asynchronously.

  • waitforACCL call will wait for these operations before it will start. Currently not implemented.

Returns:

ACCLRequest* Request object used for waiting and checking for operation status

ACCLRequest *copy(BaseBuffer &srcbuf, BaseBuffer &dstbuf, unsigned int count, bool from_fpga = false, bool to_fpga = false, bool run_async = false, std::vector<ACCLRequest*> waitfor = {})

Copy a buffer on the FPGA.

Parameters:
  • srcbufBuffer that contains the data to be copied. Create a buffer using ACCL::create_buffer.

  • dstbufBuffer where the data should be stored to. Create a buffer using ACCL::create_buffer.

  • count – Amount of elements in buffer to copy.

  • from_fpga – Set to true if the data is already on the FPGA.

  • to_fpga – Set to true if the copied data will be used on the FPGA only.

  • run_async – Run the ACCL call asynchronously.

  • waitforACCL call will wait for these operations before it will start. Currently not implemented.

Returns:

ACCLRequest* Request object used for waiting and checking for operation status

ACCLRequest *copy_from_stream(BaseBuffer &dstbuf, unsigned int count, bool to_fpga = false, bool run_async = false, std::vector<ACCLRequest*> waitfor = {})

Copy a buffer on the FPGA.

Parameters:
  • dstbufBuffer where the data should be stored to. Create a buffer using ACCL::create_buffer.

  • count – Amount of elements in buffer to copy.

  • to_fpga – Set to true if the data is already on the FPGA.

  • run_async – Run the ACCL call asynchronously.

  • waitforACCL call will wait for these operations before it will start. Currently not implemented.

Returns:

ACCLRequest* Request object used for waiting and checking for operation status

ACCLRequest *copy_to_stream(BaseBuffer &srcbuf, unsigned int count, bool from_fpga = false, bool run_async = false, std::vector<ACCLRequest*> waitfor = {})

Copy a buffer on the FPGA.

Parameters:
  • srcbufBuffer that contains the data to be copied. Create a buffer using ACCL::create_buffer.

  • count – Amount of elements in buffer to copy.

  • from_fpga – Set to true if the data is already on the FPGA.

  • run_async – Run the ACCL call asynchronously.

  • waitforACCL call will wait for these operations before it will start. Currently not implemented.

Returns:

ACCLRequest* Request object used for waiting and checking for operation status

ACCLRequest *copy_from_to_stream(dataType dst_data_type, unsigned int count, bool run_async = false, std::vector<ACCLRequest*> waitfor = {})

Copy a buffer on the FPGA.

Parameters:
  • dst_data_type – Data type of input and output to stream.

  • count – Amount of elements in buffer to copy.

  • run_async – Run the ACCL call asynchronously.

  • waitforACCL call will wait for these operations before it will start. Currently not implemented.

Returns:

ACCLRequest* Request object used for waiting and checking for operation status

ACCLRequest *combine(unsigned int count, reduceFunction function, BaseBuffer &val1, BaseBuffer &val2, BaseBuffer &result, bool val1_from_fpga = false, bool val2_from_fpga = false, bool to_fpga = false, bool run_async = false, std::vector<ACCLRequest*> waitfor = {})

Perform reduce operation on two buffers on the FPGA.

Parameters:
  • count – Amount of elements to perform reduce operation on.

  • function – Reduce operation to perform.

  • val1 – First buffer that should be used for reduce operation. Create a buffer using ACCL::create_buffer.

  • val2 – Second buffer that should be used for reduce operation. Create a buffer using ACCL::create_buffer.

  • resultBuffer where the result should be stored to. Create a buffer using ACCL::create_buffer.

  • val1_from_fpga – Set to true if the data of the first buffer is already on the FPGA.

  • val2_from_fpga – Set to true if the data of the second buffer is already on the FPGA.

  • to_fpga – Set to true if the copied data will be used on the FPGA only.

  • run_async – Run the ACCL call asynchronously.

  • waitforACCL call will wait for these operations before it will start. Currently not implemented.

Returns:

ACCLRequest* Request object used for waiting and checking for operation status

ACCLRequest *bcast(BaseBuffer &buf, unsigned int count, unsigned int root, communicatorId comm_id = GLOBAL_COMM, bool from_fpga = false, bool to_fpga = false, dataType compress_dtype = dataType::none, bool run_async = false, std::vector<ACCLRequest*> waitfor = {})

Performs the broadcast operation on the FPGA.

Parameters:
  • bufBuffer that should contain the same data as the root after the operation. Create a buffer using ACCL::create_buffer.

  • count – Amount of elements in buffer to broadcast.

  • root – Rank to broadcast the data from.

  • comm_id – Index of communicator to use.

  • from_fpga – Set to true if the data is already on the FPGA.

  • to_fpga – Set to true if the copied data will be used on the FPGA only.

  • compress_dtype – Datatype to compress buffers to over ethernet.

  • run_async – Run the ACCL call asynchronously.

  • waitforACCL call will wait for these operations before it will start. Currently not implemented.

Returns:

ACCLRequest* Request object used for waiting and checking for operation status

ACCLRequest *scatter(BaseBuffer &sendbuf, BaseBuffer &recvbuf, unsigned int count, unsigned int root, communicatorId comm_id = GLOBAL_COMM, bool from_fpga = false, bool to_fpga = false, dataType compress_dtype = dataType::none, bool run_async = false, std::vector<ACCLRequest*> waitfor = {})

Performs the scatter operation on the FPGA.

Parameters:
  • sendbufBuffer of count × world size elements that contains the data to be scattered. Create a buffer using ACCL::create_buffer. You can pass a DummyBuffer on non-root ranks.

  • recvbufBuffer of count elements where the scattered data should be stored. Create a buffer using ACCL::create_buffer.

  • count – Amount of elements to scatter per rank.

  • root – Rank to scatter the data from.

  • comm_id – Index of communicator to use.

  • from_fpga – Set to true if the data is already on the FPGA.

  • to_fpga – Set to true if the scattered data will be used on the FPGA only.

  • compress_dtype – Datatype to compress buffers to over ethernet.

  • run_async – Run the ACCL call asynchronously.

  • waitforACCL call will wait for these operations before it will start. Currently not implemented.

Returns:

ACCLRequest* Request object used for waiting and checking for operation status

ACCLRequest *gather(BaseBuffer &sendbuf, BaseBuffer &recvbuf, unsigned int count, unsigned int root, communicatorId comm_id = GLOBAL_COMM, bool from_fpga = false, bool to_fpga = false, dataType compress_dtype = dataType::none, bool run_async = false, std::vector<ACCLRequest*> waitfor = {})

Performs the gather operation on the FPGA.

Parameters:
  • sendbufBuffer of count elements that contains the data to be gathered. Create a buffer using ACCL::create_buffer.

  • recvbufBuffer of count × world size elements to where the data should be gathered. Create a buffer using ACCL::create_buffer. You can pass a DummyBuffer on non-root ranks.

  • count – Amount of elements to gather per rank.

  • root – Rank to gather the data to.

  • comm_id – Index of communicator to use.

  • from_fpga – Set to true if the data is already on the FPGA.

  • to_fpga – Set to true if the gathered data will be used on the FPGA only.

  • compress_dtype – Datatype to compress buffers to over ethernet.

  • run_async – Run the ACCL call asynchronously.

  • waitforACCL call will wait for these operations before it will start. Currently not implemented.

Returns:

ACCLRequest* Request object used for waiting and checking for operation status

ACCLRequest *allgather(BaseBuffer &sendbuf, BaseBuffer &recvbuf, unsigned int count, communicatorId comm_id = GLOBAL_COMM, bool from_fpga = false, bool to_fpga = false, dataType compress_dtype = dataType::none, bool run_async = false, std::vector<ACCLRequest*> waitfor = {})

Performs the allgather operation on the FPGA.

Parameters:
  • sendbufBuffer of count elements that contains the data to be gathered. Create a buffer using ACCL::create_buffer.

  • recvbufBuffer of count × world size elements to where the data should be gathered. Create a buffer using ACCL::create_buffer.

  • count – Amount of elements to gather per rank.

  • comm_id – Index of communicator to use.

  • from_fpga – Set to true if the data is already on the FPGA.

  • to_fpga – Set to true if the gathered data will be used on the FPGA only.

  • compress_dtype – Datatype to compress buffers to over ethernet.

  • run_async – Run the ACCL call asynchronously.

  • waitforACCL call will wait for these operations before it will start. Currently not implemented.

Returns:

ACCLRequest* Request object used for waiting and checking for operation status

ACCLRequest *reduce(BaseBuffer &sendbuf, BaseBuffer &recvbuf, unsigned int count, unsigned int root, reduceFunction func, communicatorId comm_id = GLOBAL_COMM, bool from_fpga = false, bool to_fpga = false, dataType compress_dtype = dataType::none, bool run_async = false, std::vector<ACCLRequest*> waitfor = {})

Performs the reduce operation on the FPGA.

Parameters:
  • sendbufBuffer that contains the data to be reduced. Create a buffer using ACCL::create_buffer.

  • recvbufBuffer to where the data should be reduced. Create a buffer using ACCL::create_buffer. You can pass a DummyBuffer on non-root ranks.

  • count – Amount of elements to reduce.

  • root – Rank to reduce the data to.

  • func – Reduce function to use.

  • comm_id – Index of communicator to use.

  • from_fpga – Set to true if the data is already on the FPGA.

  • to_fpga – Set to true if the reduced data will be used on the FPGA only.

  • compress_dtype – Datatype to compress buffers to over ethernet.

  • run_async – Run the ACCL call asynchronously.

  • waitforACCL call will wait for these operations before it will start. Currently not implemented.

Returns:

ACCLRequest* Request object used for waiting and checking for operation status

ACCLRequest *reduce(dataType src_data_type, BaseBuffer &recvbuf, unsigned int count, unsigned int root, reduceFunction func, communicatorId comm_id = GLOBAL_COMM, bool to_fpga = false, dataType compress_dtype = dataType::none, bool run_async = false, std::vector<ACCLRequest*> waitfor = {})

Performs the reduce operation on the FPGA from data in local stream.

Parameters:
  • src_data_type – Data type of the input.

  • recvbufBuffer to where the data should be reduced. Create a buffer using ACCL::create_buffer. You can pass a DummyBuffer on non-root ranks.

  • count – Amount of elements to reduce.

  • root – Rank to reduce the data to.

  • func – Reduce function to use.

  • comm_id – Index of communicator to use.

  • to_fpga – Set to true if the reduced data will be used on the FPGA only.

  • compress_dtype – Datatype to compress buffers to over ethernet.

  • run_async – Run the ACCL call asynchronously.

  • waitforACCL call will wait for these operations before it will start. Currently not implemented.

Returns:

ACCLRequest* Request object used for waiting and checking for operation status

ACCLRequest *reduce(BaseBuffer &sendbuf, dataType dst_data_type, unsigned int count, unsigned int root, reduceFunction func, communicatorId comm_id = GLOBAL_COMM, bool from_fpga = false, dataType compress_dtype = dataType::none, bool run_async = false, std::vector<ACCLRequest*> waitfor = {})

Performs the memory to stream reduce operation on the FPGA.

Parameters:
  • sendbufBuffer that contains the data to be reduced. Create a buffer using ACCL::create_buffer.

  • dst_data_type – Data type of the output.

  • count – Amount of elements to reduce.

  • root – Rank to reduce the data to.

  • func – Reduce function to use.

  • comm_id – Index of communicator to use.

  • from_fpga – Set to true if the data is already on the FPGA.

  • to_fpga – Set to true if the reduced data will be used on the FPGA only.

  • compress_dtype – Datatype to compress buffers to over ethernet.

  • run_async – Run the ACCL call asynchronously.

  • waitforACCL call will wait for these operations before it will start. Currently not implemented.

Returns:

ACCLRequest* Request object used for waiting and checking for operation status

ACCLRequest *reduce(dataType src_data_type, dataType dst_data_type, unsigned int count, unsigned int root, reduceFunction func, communicatorId comm_id = GLOBAL_COMM, dataType compress_dtype = dataType::none, bool run_async = false, std::vector<ACCLRequest*> waitfor = {})

Performs the stream to stream reduce operation on the FPGA.

Parameters:
  • src_data_type – Data type of the input.

  • dst_data_type – Data type of the output.

  • count – Amount of elements to reduce.

  • root – Rank to reduce the data to.

  • func – Reduce function to use.

  • comm_id – Index of communicator to use.

  • to_fpga – Set to true if the reduced data will be used on the FPGA only.

  • compress_dtype – Datatype to compress buffers to over ethernet.

  • run_async – Run the ACCL call asynchronously.

  • waitforACCL call will wait for these operations before it will start. Currently not implemented.

Returns:

ACCLRequest* Request object used for waiting and checking for operation status

ACCLRequest *allreduce(BaseBuffer &sendbuf, BaseBuffer &recvbuf, unsigned int count, reduceFunction func, communicatorId comm_id = GLOBAL_COMM, bool from_fpga = false, bool to_fpga = false, dataType compress_dtype = dataType::none, bool run_async = false, std::vector<ACCLRequest*> waitfor = {})

Performs the allreduce operation on the FPGA.

Parameters:
  • sendbufBuffer that contains the data to be reduced. Create a buffer using ACCL::create_buffer.

  • recvbufBuffer to where the data should be reduced. Create a buffer using ACCL::create_buffer.

  • count – Amount of elements to reduce.

  • func – Reduce function to use.

  • comm_id – Index of communicator to use.

  • from_fpga – Set to true if the data is already on the FPGA.

  • to_fpga – Set to true if the reduced data will be used on the FPGA only.

  • compress_dtype – Datatype to compress buffers to over ethernet.

  • run_async – Run the ACCL call asynchronously.

  • waitforACCL call will wait for these operations before it will start. Currently not implemented.

Returns:

ACCLRequest* Request object used for waiting and checking for operation status

ACCLRequest *reduce_scatter(BaseBuffer &sendbuf, BaseBuffer &recvbuf, unsigned int count, reduceFunction func, communicatorId comm_id = GLOBAL_COMM, bool from_fpga = false, bool to_fpga = false, dataType compress_dtype = dataType::none, bool run_async = false, std::vector<ACCLRequest*> waitfor = {})

Performs the reduce_scatter operation on the FPGA.

Parameters:
  • sendbufBuffer of count × world size elements that contains the data to be reduced. Create a buffer using ACCL::create_buffer.

  • recvbufBuffer of count elements to where the data should be reduced. Create a buffer using ACCL::create_buffer.

  • count – Amount of elements to reduce per rank.

  • func – Reduce function to use.

  • comm_id – Index of communicator to use.

  • from_fpga – Set to true if the data is already on the FPGA.

  • to_fpga – Set to true if the reduced data will be used on the FPGA only.

  • compress_dtype – Datatype to compress buffers to over ethernet.

  • run_async – Run the ACCL call asynchronously.

  • waitforACCL call will wait for these operations before it will start. Currently not implemented.

Returns:

ACCLRequest* Request object used for waiting and checking for operation status

ACCLRequest *alltoall(BaseBuffer &sendbuf, BaseBuffer &recvbuf, unsigned int count, communicatorId comm_id = GLOBAL_COMM, bool from_fpga = false, bool to_fpga = false, dataType compress_dtype = dataType::none, bool run_async = false, std::vector<ACCLRequest*> waitfor = {})

Performs an alltoall shuffle operation on the FPGA.

Parameters:
  • sendbufBuffer of count elements that contains the data to be shuffled. Create a buffer using ACCL::create_buffer.

  • recvbufBuffer of count × world size elements to where the data should be gathered. Create a buffer using ACCL::create_buffer.

  • count – Amount of elements to shuffle per rank.

  • comm_id – Index of communicator to use.

  • from_fpga – Set to true if the data is already on the FPGA.

  • to_fpga – Set to true if the gathered data will be used on the FPGA only.

  • compress_dtype – Datatype to compress buffers to over ethernet.

  • run_async – Run the ACCL call asynchronously.

  • waitforACCL call will wait for these operations before it will start. Currently not implemented.

Returns:

ACCLRequest* Request object used for waiting and checking for operation status

ACCLRequest *barrier(communicatorId comm_id = GLOBAL_COMM, std::vector<ACCLRequest*> waitfor = {})

Performs a barrier on the FPGA.

Parameters:
  • comm_id – Index of communicator to use.

  • waitforACCL call will wait for these operations before it will start. Currently not implemented.

Returns:

ACCLRequest* Request object used for waiting and checking for operation status

void wait(ACCLRequest *request)

Wait for an ACCLRequest.

Waits (blocking) the caller while the request completes

Parameters:

request – Reference to the ACCLRequest to wait for

bool wait(ACCLRequest *request, std::chrono::milliseconds timeout)

Wait for an ACCLRequest.

Waits (blocking) the caller while the request completes, or until time out

Parameters:
  • request – Reference to the ACCLRequest to wait for

  • timeout – Time in milli seconds to wait for

Returns:

true If the request complete

Returns:

false If the request did not complete

bool test(ACCLRequest *request)

Check if given request is completed.

Parameters:

request – Reference to the ACCLRequest to test

Returns:

true If the request complete

Returns:

false If the request did not complete

uint64_t get_duration(ACCLRequest *request)

Get duration of call associated to request.

Parameters:

request – Reference to the ACCLRequest to test

Returns:

uint64_t Call duration in nanoseconds

void free_request(ACCLRequest *request)

Check if given request is completed.

Parameters:

request – Reference to the ACCLRequest to free

inline bool is_simulated() const

Check if ACCL is being run in simulated mode or not.

Returns:

true ACCL is running on an emulator or simulator.

Returns:

false ACCL is running on hardware.

template<typename dtype>
inline std::unique_ptr<Buffer<dtype>> create_buffer(size_t length, dataType type)

Construct a new device buffer object without an existing host buffer.

Note that when running in simulated mode, this constructor will not create an underlying simulated BO buffer. If you need this functionality, use create_buffer(xrt::bo &, size_t, dataType).

Template Parameters:

dtype – Datatype of the buffer.

Parameters:
  • length – Amount of elements to allocate for.

  • typeACCL datatype of the buffer.

Returns:

std::unique_ptr<Buffer<dtype>> The allocated buffer.

template<typename dtype>
inline std::unique_ptr<Buffer<dtype>> create_buffer_host(size_t length, dataType type)

Construct a new host buffer object.

Note that when running in simulated mode, this constructor will not create an underlying simulated BO buffer. If you need this functionality, use create_buffer(xrt::bo &, size_t, dataType).

Template Parameters:

dtype – Datatype of the buffer.

Parameters:
  • length – Amount of elements to allocate for.

  • typeACCL datatype of the buffer.

Returns:

std::unique_ptr<Buffer<dtype>> The allocated buffer.

template<typename dtype>
inline std::unique_ptr<Buffer<dtype>> create_buffer(size_t length, dataType type, unsigned mem_grp)

Construct a new buffer object without an existing host buffer on the specified memory bank.

Only use this function if you want to store the buffer on a different memory bank than the devicemem bank specified during construction.

Note that when running in simulated mode, this constructor will not create an underlying simulated BO buffer. If you need this functionality, use create_buffer(xrt::bo &, size_t, dataType).

Template Parameters:

dtype – Datatype of the buffer.

Parameters:
  • length – Amount of elements to allocate for.

  • typeACCL datatype of the buffer.

  • mem_grp – Memory bank to allocate buffer on.

Returns:

std::unique_ptr<Buffer<dtype>> The allocated buffer.

template<typename dtype>
inline std::unique_ptr<Buffer<dtype>> create_buffer(dtype *host_buffer, size_t length, dataType type)

Construct a new buffer object from an existing host pointer.

On hardware it is required that the host pointer is aligned to 4096 bytes. If a non-aligned host pointer is provided and ACCL is running on hardware, ACCL will keep it’s own aligned host buffer, and copy between the unaligned and aligned host buffers when required. It is recommended to provide an aligned host pointer to avoid unnecessary memory copies.

Note that when running in simulated mode, this constructor will not create an underlying simulated BO buffer. If you need this functionality, use create_buffer(xrt::bo &, size_t, dataType).

Template Parameters:

dtype – Datatype of the buffer.

Parameters:
  • host_buffer – The host pointer containing the data.

  • length – Amount of elements in the host buffer.

  • typeACCL datatype of the buffer.

Returns:

std::unique_ptr<Buffer<dtype>> The allocated buffer.

template<typename dtype>
inline std::unique_ptr<Buffer<dtype>> create_buffer(dtype *host_buffer, size_t length, dataType type, unsigned mem_grp)

Construct a new buffer object from an existing host pointer on the specified memory bank.

Only use this function if you want to store the buffer on a different memory bank than the devicemem bank specified during construction.

On hardware it is required that the host pointer is aligned to 4096 bytes. If a non-aligned host pointer is provided and ACCL is running on hardware, ACCL will keep it’s own aligned host buffer, and copy between the unaligned and aligned host buffers when required. It is recommended to provide an aligned host pointer to avoid unnecessary memory copies.

Note that when running in simulated mode, this constructor will not create an underlying simulated BO buffer. If you need this functionality, use create_buffer(xrt::bo &, size_t, dataType).

Template Parameters:

dtype – Datatype of the buffer.

Parameters:
  • host_buffer – The host pointer containing the data.

  • length – Amount of elements in the host buffer.

  • typeACCL datatype of the buffer.

  • mem_grp – Memory bank to allocate buffer on.

Returns:

std::unique_ptr<Buffer<dtype>> The allocated buffer.

template<typename dtype>
inline std::unique_ptr<Buffer<dtype>> create_buffer(xrt::bo &bo, size_t length, dataType type)

Construct a new buffer object from an existing BO buffer.

When using an ACCL emulator or simulator, this function can be used to pass a simulated BO buffer from the Vitis emulator and use the Vitis emulator together with the ACCL emulator. In this case, ACCL will also create a new internal simulated BO buffer to copy data between the simulated BO buffer and the simulated ACCL buffer when required.

When running on hardware, ACCL will simply use this BO buffer internally, instead of allocating a new one.

Template Parameters:

dtype – Datatype of the buffer.

Parameters:
  • bo – The BO buffer to use.

  • length – Amount of elements in the BO buffer.

  • typeACCL datatype of the buffer.

Returns:

std::unique_ptr<Buffer<dtype>> The allocated buffer.

template<typename dtype>
inline std::unique_ptr<Buffer<dtype>> create_buffer_p2p(size_t length, dataType type)

Construct a new p2p buffer object.

Will create a normal buffer when running in simulated mode.

Note that when running in simulated mode, this constructor will not create an underlying simulated BO buffer. If you need this functionality, use create_buffer_p2p(xrt::bo &, size_t, dataType).

Template Parameters:

dtype – Datatype of the buffer.

Parameters:
  • length – Amount of elements to allocate for.

  • typeACCL datatype of the buffer.

Returns:

std::unique_ptr<Buffer<dtype>> The allocated P2P buffer.

template<typename dtype>
inline std::unique_ptr<Buffer<dtype>> create_buffer_p2p(size_t length, dataType type, unsigned mem_grp)

Construct a new p2p buffer object on the specified memory bank.

Will create a normal buffer when running in simulated mode.

Only use this function if you want to store the buffer on a different memory bank than the devicemem bank specified during construction.

Note that when running in simulated mode, this constructor will not create an underlying simulated BO buffer. If you need this functionality, use create_buffer_p2p(xrt::bo &, size_t, dataType).

Template Parameters:

dtype – Datatype of the buffer.

Parameters:
  • length – Amount of elements to allocate for.

  • typeACCL datatype of the buffer.

  • mem_grp – Memory bank to allocate buffer on.

Returns:

std::unique_ptr<Buffer<dtype>> The allocated P2P buffer.

template<typename dtype>
inline std::unique_ptr<Buffer<dtype>> create_coyotebuffer(size_t length, dataType type)

Construct a new coyote buffer object without an existing host buffer.

Coyote buffer object doesn’t have a notion of memory banks

Template Parameters:

dtype – Datatype of the buffer.

Parameters:
  • length – Amount of elements to allocate for.

  • typeACCL datatype of the buffer.

Returns:

std::unique_ptr<Buffer<dtype>> The allocated buffer.

std::string dump_exchange_memory()

Dump the content of the exchange memory to a string.

Returns:

std::string Content of the exchange memory.

std::string dump_eager_rx_buffers(size_t n_egr_rx_bufs, bool dump_data = true)

Dump the content of the Eager-mode RX buffers to a string for the first n_egr_rx_bufs buffers.

Parameters:
  • n_egr_rx_bufs – Amount of buffers to dump the content of.

  • dump_data – Dump buffer contents along with metadata.

Returns:

std::string Content of the Eager-mode RX buffers.

inline std::string dump_eager_rx_buffers(bool dump_data = true)

Dump the content of all Eager-mode RX buffers to a string.

Returns:

std::string Content of all Eager-mode RX buffers.

std::string dump_communicator()

Dump the content of the communicator to a string.

Returns:

std::string Content of the communicator.

addr_t get_communicator_addr(communicatorId comm_id = GLOBAL_COMM)

Return CCLO address of communicator.

Parameters:

ACCL::communicatorId – Numerical ID of the target communicator.

Returns:

addr_t Address of the communicator in CCLO memory.

addr_t get_arithmetic_config_addr(std::pair<dataType, dataType> id)

Return CCLO address of arithmetic config.

Parameters:

unsigned – int Numerical ID of the target arithmetic configuration.

Returns:

addr_t Address of the arithmetic configuration in CCLO memory.

inline int devicemem()

Retrieve the devicemem memory bank.

Returns:

int The devicemem memory bank

void open_port(communicatorId comm_id = GLOBAL_COMM)

Open port on network interface of FPGA to exchange ACCL message.

Parameters:

comm_id – Numerical ID of the target communicator.

void open_con(communicatorId comm_id = GLOBAL_COMM)

Open connections with other ranks.

Parameters:

comm_id – Numerical ID of the target communicator.

void close_con(communicatorId comm_id = GLOBAL_COMM)

Close connections with other ranks.

Parameters:

comm_id – Numerical ID of the target communicator.