Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well, other than crypto what is there really to do that is more efficient in the GPU?


I don't think it needs to be more efficient than the CPU to merit moving to the GPU. If the horsing around to get the data in and out is less work than doing the job, then you may was well put the GPU to work and improve your total throughput.

Perhaps the memory page deduplication candidate detection could run out there. It would be memory bound, but maybe by not ruining the CPU cache it would be a win. (This is important for systems running a bunch of virtual machines.)


"the horsing around to get the data in and out" seems to be the key factor. An analysis of BLAS libraries' performance across several architectures [1] showed that GPU-based calculation only approached implementations like Goto BLAS with matrix dimensions well up into the thousands. That's just one example, but there seems to be a fair bit of overhead in getting the data to and from the GPU.

[1] http://dirk.eddelbuettel.com/blog/code/gcbd/


Calculating error correction code, though efficiency depends on the memory architecture.

I heard Tsubame, a supercomputer built with NVIDIA GPUs, calculated ECC on its GPU-side memory with GPU code because those GPUs were consumer grade and didn't have hardware ECC.


Routing: http://shader.kaist.edu/packetshader/

This was really non-obvious to me.


It's non-obvious because it's a bad idea. This paper comes from a wacky world where latency and power consumption don't matter. The comparisons between CPU vs. GPU aren't that compelling just on the surface of it. The latency/power consumption numbers (compared to dedicated ASICs for this sort of thing) are just laughable.

Being the most compelling 'software router' is sort of like being the 'tallest midget' but even in this domain, I think their alleged advantages over CPU-only are mainly due to carefully massaging the presentation of the data.


OCR.(Optical character recognition) Picture recognition - face detection. Speech recognition. Speech synthesis. Video recognition.

Multi touch gestures and handwriting recognition.


Phiber Optik (Mark Abene) had a pretty interesting talk yesterday at NY Hacker about using CUDA for intrusion detection calculations.


RAID checksums computation looks like an obvious possibility. We'd need a battery backup for the VRAM, too :)


A single core can hash (checksum) 5 GB/s using murmurhash. The data you checksum is probably already in L1/L2 cache (write to RAID) or going to be used by userland, and us reading the data will just mean userland process gets its data from cache instead (read from RAID). You can get maybe 2-6 GB/s to GPU. Add the latency (sync, etc.) and GPU time to calculate the hash, you've probably radically slowed down the process. Additionally, assuming DMA transfer, your memory subsystem is more stressed due to both CPU and GPU reading same data.

Oh, and simple xor? Well, assuming data is in L2 already, Intel i7 can xor 10+ GB/s using just single core, between 3 buffers, aka minimum RAID 5. Fastest RAID adapters can achieve only a fraction of that speed.


I think this is very memory intensive. Remember the GPU would have to calculate block checksums preferably from the main memory where the buffer resides.

Maybe block deduplication could be done this way. If the block is a dupe, skipping its allocation on the disk (it would save at least one block write) could offset a lot of block hash calculations.


Working with polynomials. That's important in Computational Geometry

https://domino.mpi-sb.mpg.de/intranet/ag1/ag1publ.nsf/0/ca00...


Calculating Viterbi paths for Hidden Markov Models is faster by an order of magnitude or two than doing it on the CPU. I worked on porting NVIDIAs OpenCL implementation to a more 'platform neutral' version for the research project I'm involved in.

Here are some more examples:

http://developer.download.nvidia.com/compute/opencl/sdk/webs...

There are many, many applications beyond crypto.


I think the question is about how the kernel can use the GPU. Linux probably doesn't need to train hidden Markov models. It might, however, need to do crypto (e.g., for an encrypted filesystem).


Oh gosh you're right. I wasn't thinking about the context in which the question was posed. Anyway, hopefully someone will find those examples interesting. NVIDIA's CUDA developer zone is chock full of great resources for GPGPU (like video lectures and tools and code examples).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: