SIMD filtering on large compressed data arraysome Software

$275-800 USD

已完成

已发布

大约 9 年前

$275-800 USD

货到付款

Contract: thrust-based SIMD filter library for compressed data. This document describes the interface and functionality of a CUDA/thrust-based library for same-instruction-multiple-data (SIMD) filtering on large compressed data arrays. Introduction: In columnar databases, a data table is decomposed by column, such that the set of corresponding rows across the columns represents a record from the original table. Large-scale data analytics on columnar databases requires the computation of predicates against these columns, such as the comparison of each element in a column against a literal. Further, these columns are often stored on disk in compressed form. In order to optimize analytics performance, we require a means of acting on compressed columns in such databases, without decompression and on the GPU. Such a modular library is described below: Types of compression: Frame-of-reference (FOR) compression: integer (long int) and float columns are stored in this format, which for compression relies on the common attribute that the range of data within a column is much smaller generally than the standard long int encodes. For example if data within a column range from 100 to 4000 each element should require only 12 bits. Rounding up this bit count to byte-alignment, that means 4 2-byte (16 bit) numbers (unsigned short) can be packed within each 64-bit standard long int. Meta information required to reconstruct the column include the number of bytes per record (bpr), total number of records represented, and the minimum value M seen within the column. Each element X in the original column is stored as (unsigned short) (X-100), and the reconstruction of element i from the compressed stream S is given by (long int) (S[i/bpr + i%bpr]) + M. Dictionary compression: string columns are stored in this format. The format consists of a dictionary of all unique string records found within the column, followed by a list of integer indexes into that dictionary. Further, the index list is FOR-compressed. FORInfo class: this class contains meta information about a compressed data array that represents integer or floating point data. DICTInfo class: meta information associated with a dictionary-compressed string column. API: template<T> thrust::iterator<bool> filterLiteral(void * cdata, T literal, FORInfo x, Predicate pred): returns a boolean iterator representing the predicate value “cdata[i] pred literal” for each packed element in the compressed cdata array. pred is a binary operator, and x is a FORInfo struct. template<T> thrust::iterator<bool> compareColumns(void *cdataA, void *cdataB, T aInfo, T bInfo, Predicate pred): returns a boolean iterator representing the predicate value “cdataA[i] pred cdataB[i]” for corresponding elements in the compressed source arrays. pred is a binary operator, and the template T is either FORInfo or DICTInfo. thrust::iterator<bool> AND (thrust::iterator<bool> A, thrust::iterator<bool> B): returns a boolean iterator where each element in the sequence is the result of a logical and, A[i] AND B[i] thrust::iterator<bool> OR (thrust::iterator<bool> A, thrust::iterator<bool> B): returns a boolean iterator where each element in the sequence is the result of a logical and, A[i] OR B[i] thrust::iterator<bool> NOT (thrust::iterator<bool> A): returns a boolean iterator where each element in the sequence is the result of a logical not, ~A[i] The contractor must also implement the Predicate structs as well, for ==, !=, <, <=, >, >=, and like, not_like for string types.

C Programming

CUDA

Database Programming

项目 ID: 7606423

关于此项目

6提案

远程项目

活跃9 年前

想赚点钱吗？

电子邮箱地址

在Freelancer上竞价的好处

设定您的预算和时间范围

为您的工作获得报酬

简要概述您的提案

免费注册和竞标工作

颁发给：

@arnechristiansen

So about myself. I am from Sofia, Bulgaria, I work in the computer games industry. rendering, engine programming. On freelancer i do mostly gpgpu code for the people + sometimes rendering tasks. I can show you some code how it looks like about the quality of my work. It resides on github. About the price of the project i cannot say anything precise now, since the tasks are not defined, may go higher, may go lower. About working times. I work 8 hours weekends + up to 3 hours during the working days. Hope this helps to you. Regards

$597 USD 在30天之内

5.0

(5条评论)

4.3

6威客以平均价$542 USD来参与此工作竞价

@prad08

Hello, I am an experienced programmer with specializations in parallel computing. I have extensive experience with CUDA and OpenCL and will be able to guarantee a quality job. If you could provide a few more details about the algorithm and performance requirements, I will be able to give a better time and budget estimation. Is there a sequential implementation already available? What exactly are your required deliverables? Hope to hear from you.

$750 USD 在15天之内

4.9

(15条评论)

4.5

@mdakteruzzaman

Hi, I have been working as IT consultant and Software Architect more than 10 years. I became OCP, CCNP, RHCE 3 years back. Earlier I completed B.Sc. Enng and M.Sc. Engg both in computer engineering. Hire only technical/professional not writers only. I'm both I can send you samples of work. I'm assuring you that you will score more than 90% Check my works here https://www.freelancer.com/projects/Software-Architecture-Engineering/Software-engineering.5777740.html https://www.freelancer.com/projects/the-analysis-design-software-system.html https://www.freelancer.com/projects/Software-Architecture-Engineering/foundation-software-engineering.html https://www.freelancer.com/projects/Software-Architecture-Java/Object-orientated-formal-specification.html I'm assuring you the best quality. I'm looking forward to hearing from you very soon

$597 USD 在6天之内