Terrain Generation Progress: Those dang fractals 2!
by Demolishun · 09/08/2012 (7:37 pm) · 8 comments
Fractals take a long time to process depending upon how many iterations you do. So of course I decided to explore putting processing of part of the fractal in a separate thread. Boy was I in for a surprise. I really hoped I could make it a little bit faster and I knew the GIL in Python could reduce the efficiency, but I was unprepared for what came next.
This image normally took about 33 seconds to compute. After dividing up the load among 16 processes it took about 15 seconds. I was like holy shat! Another image that I tested at first was 8 seconds normally. It dropped to 2 seconds. Wow! I am glad I played with threading. Now I need to explore multiprocessing so I can completely uncouple the GIL. The main reason for the speedup is because the math libraries I am using are written in C code and don't need the GIL to operate all the time. Note: If you are not familiar with the GIL in Python then search Google.
The first image is 1024x1024 and calculated to 256 iterations. This image is 4096x4096 to 256 iterations. WARNING this is a 6MByte file! That one took 4 minutes to calculate. I am guessing it would take almost 10 minutes without threading.
Here is the code (there are a few libraries needed to run this):
This image normally took about 33 seconds to compute. After dividing up the load among 16 processes it took about 15 seconds. I was like holy shat! Another image that I tested at first was 8 seconds normally. It dropped to 2 seconds. Wow! I am glad I played with threading. Now I need to explore multiprocessing so I can completely uncouple the GIL. The main reason for the speedup is because the math libraries I am using are written in C code and don't need the GIL to operate all the time. Note: If you are not familiar with the GIL in Python then search Google.
The first image is 1024x1024 and calculated to 256 iterations. This image is 4096x4096 to 256 iterations. WARNING this is a 6MByte file! That one took 4 minutes to calculate. I am guessing it would take almost 10 minutes without threading.
Here is the code (there are a few libraries needed to run this):
from numpy import *
import numpy
import pylab
import matplotlib.cm as cm
import matplotlib.colors as mcolors
import matplotlib.pyplot as pyplot
import math
import time
from threading import Thread
numpy.seterr(all='warn')
class mandelbrot_thread(Thread):
def __init__(self,maxit,h,w,divtime,z,c):
Thread.__init__(self)
self.maxit = maxit
self.h = h
self.w = w
self.divtime = divtime
self.z = z
self.c = c
def run(self):
try:
for i in xrange(self.maxit):
#z = z**2 + c
self.z = (self.z**2) + self.c
absz = self.z*conj(self.z)
diverge = absz > 4 #2**2 # who is diverging
div_now = diverge & (self.divtime==self.maxit) # who is diverging now
#divtime[div_now] = i
fi = float(i)
#divtime[div_now] = fi - log2(log(absz[div_now]))
self.divtime[div_now] = int16(fi - log2(log((self.z**2)[div_now])))
except Exception, e:
print "thread error",e
def mandelbrot( h,w, maxit=20, mand=True ):
'''Returns an image of the Mandelbrot fractal of size (h,w).
'''
y,x = ogrid[ -1.5:1.5:h*1j, -1.5:1.5:w*1j ]
xpos = -0.25
ypos = -0.85
diff = 1.0
#y,x = ogrid[ ypos:ypos+diff:h*1j, xpos:xpos+diff:w*1j ]
#y,x = ogrid[ -2.0:2.0:h*1j, -2.5:2.0:w*1j ]
#y,x = ogrid[ -2.0:2.0:h*1j, -2.5:2.0:w*1j ]
#y,x = ogrid[ 2.0:-2.0:h*1j, 2.5:-2.0:w*1j ]
pos = -0.005
xpos = 0.450 + pos + 0.00585
ypos = 0.400 + 0.01125 + pos
diff = 0.0006
print xpos,ypos,diff,diff
imag = 1j
#y,x = ogrid[ ypos:ypos+diff:h*imag, xpos:xpos+diff:w*imag ]
#y,x = ogrid[ -0.001:0.001:h*1j, -0.001:0.001:w*1j ]
if(mand):
c = x+y*imag
y,x = ogrid[ 0.0:0.0:h*1j, 0.0:0.0:w*1j ]
z = x+y*imag
else:
z = x+y*imag
#y,x = ogrid[ 0.0:1.0:h*1j, 0.5:1.0:w*1j ]
#c = complex(0.5,-0.5)
# http://en.wikipedia.org/wiki/Julia_set
gr = 1.61803398
#c = complex(1-gr,0.0) # golden ratio
#c = complex(gr-2,gr-1)
#c = complex(0.285,0.0)
#c = complex(0.285,0.01)
#c = complex(0.45,0.1428)
#c = complex(-0.70176,-0.3842)
#c = complex(-0.835,-0.2321)
#c = complex(-0.8,0.156)
c = [complex(0.1,gr-1)]
#divtime = maxit + zeros(z.shape, dtype=numpy.int16)
divtime = maxit + zeros(z.shape, dtype=numpy.int16)
# thread test
#print divtime.shape
if(1):
mtlist = []
axis = divtime.shape[0]
axislen = axis/16
for i in xrange(0,axis,axislen):
if len(c) == 1:
mt = mandelbrot_thread(maxit,h,w,divtime[i:i+axislen],z[i:i+axislen],c[0])
else:
mt = mandelbrot_thread(maxit,h,w,divtime[i:i+axislen],z[i:i+axislen],c[i:i+axislen])
mt.start()
mtlist.append(mt)
for mt in mtlist:
mt.join()
return divtime
if(0):
if len(c) == 1:
divtime = divtime[0]
z = z[0]
else:
divtime = divtime[0]
z = z[0]
c = c[0]
for i in xrange(maxit):
#z = z**2 + c
z = (z**2) + c
absz = z*conj(z)
diverge = absz > 4 #2**2 # who is diverging
div_now = diverge & (divtime==maxit) # who is diverging now
#divtime[div_now] = i
fi = float(i)
#divtime[div_now] = fi - log2(log(absz[div_now]))
divtime[div_now] = int16(fi - log2(log((z**2)[div_now])))
return divtime
def fractal( h,w, maxit=20, set=0):
y,x = ogrid[ -1.4:1.4:h*1j, -2:0.8:w*1j ]
print y[0],x[0][0]
y,x = ogrid[ -1.4:1.4:h, -2:0.8:w ]
print y[0],x[0]
def main():
res = 1024
#iterations = 27
iterations = 256
#pylab.imshow(mandelbrot(res,res,maxit=50), cmap=cm.Greys_r)
#fractal(res,res,maxit=50)
mand = False
#mand = True
start = time.time()
setcalc = mandelbrot(res,res,maxit=iterations,mand=mand)
#setcalc += mandelbrot(res,res,maxit=iterations*2,mand=mand)
#setcalc += mandelbrot(res,res,maxit=iterations*3,mand=mand)
smin = setcalc.min()
smax = setcalc.max()
setcalc -= smin
setcalc = (setcalc/float(smax-smin))*65535
print smax,smin
#vals = cm.gist_heat(numpy.arange(256))
#print vals.shape
#valscm = {'red':vals[:,0],'green':vals[:,1],'blue':vals[:,2]}
firedict = {
'red': [(0.0, 0.0, 0.0),
(0.02, 0.2, 0.2),
#(0.1, 0.7, 0.3),
(0.25, 0.8, 0.75),
(0.75, 0.9, 1.0),
(1.0, 1.0, 1.0)],
'green': [(0.0, 0.0, 0.0),
#(0.01, 0.1, 0.1),
#(0.02, 0.05, 0.2),
(0.75, 0.9, 1.0),
(1.0, 0.9, 1.0)],
'blue': [(0.0, 0.0, 0.0),
#(0.01, 0.1, 0.1),
#(0.02, 0.1, 0.2),
#(0.5, 0.0, 0.0),
(0.75, 0.1, 1.0),
(1.0, 0.6, 1.0)]}
greydict = {
'red': [(0.0, 0.0, 0.0),
(1.0, 1.0, 1.0)],
'green': [(0.0, 0.0, 0.0),
(1.0, 1.0, 1.0)],
'blue': [(0.0, 0.0, 0.0),
(1.0, 1.0, 1.0)]}
big_heat_map = mcolors.LinearSegmentedColormap("big_heat",firedict, 65536, gamma=1.0)
#print dir(vals)
print big_heat_map.N
#bhm_vals = big_heat_map(numpy.arange(256))
#bhm = cm.ScalarMappable(cmap=big_heat_map)
pyplot.register_cmap(cmap=big_heat_map)
#print dir(cm.gist_heat.get_array())
#pylab.imshow(setcalc, cmap=cm.gist_heat)
pylab.imshow(setcalc, cmap=pyplot.get_cmap("big_heat"))
end = time.time() - start
print end
pylab.show()
pylab.imsave('tmp.png',setcalc, cmap=pyplot.get_cmap("big_heat"))
if __name__ == '__main__':
main()About the author
I love programming, I love programming things that go click, whirr, boom. For organized T3D Links visit: http://demolishun.com/?page_id=67
#2
Yep, that is why I am going to create a nice library just for projecting fractals. I am also working on a fractal tool for isolating coordinates. It will combine fast render with higher resolution rendering. I will use this program to start creating a database of useful coordinates in both the Mandelbrot and Julia sets.
Some other things I will explore is getting the fractal to animated through coordinates. I want to see what will happen in a series of renders by slightly changing the Julia fixed coordinate through multiple renders. There may be some values in a morphed fractal to simulate noise.
One final feature will be extremely high resolution images for wallpapers and stuff. The beauty is I will have a front end to the library and use the library in the game as well. All the same codebase for the fractal generation.
09/09/2012 (12:44 pm)
@Dayuppy,Yep, that is why I am going to create a nice library just for projecting fractals. I am also working on a fractal tool for isolating coordinates. It will combine fast render with higher resolution rendering. I will use this program to start creating a database of useful coordinates in both the Mandelbrot and Julia sets.
Some other things I will explore is getting the fractal to animated through coordinates. I want to see what will happen in a series of renders by slightly changing the Julia fixed coordinate through multiple renders. There may be some values in a morphed fractal to simulate noise.
One final feature will be extremely high resolution images for wallpapers and stuff. The beauty is I will have a front end to the library and use the library in the game as well. All the same codebase for the fractal generation.
#3
09/11/2012 (5:07 am)
Very interested in how this turns out. Especially if you use PolyVox for certain assets like, rocks, arc etc.. This could get very nifty indeed!
#4
Just a couple of things for you to consider if you didn't already, with varying degrees of pedanticness.
When threading, if your tasks are all CPU bound (as they are with fractal generation) then there is no benefit to using more threads than there are CPU cores. Due to extra context switches, it can theoretically be slower to use too many threads. Querying the number of cores should be easy, but hardcoding to however many cores you have is easier if it's just for you. However, the actual numbers depend a lot on various factors and it may not translate to any kind of noticeable difference. It's still worth benchmarking, though.
Python seems a bit of a weird choice for generating fractals. I guess it's fine for experimenting, and if the generation times for your real world production uses are acceptable then that's fine too. However, if you want the fastest possible fractal generation you should probably look into using shaders to do it on the GPU. There should be lots of example code around for OpenGL and DirectX based fractal generation, and it shouldn't be too hard to get it running in Python.
Aside from speed, using OpenGL has some other advantages. If you use a floating point (or half float) texture for the fractal, then you have much more precision than you would have in an ordinary 8bit RGB texture. For displaying pretty pictures it makes no difference, but for terrain generation etc it may make a significant difference. You only need a single channel in the fractal texture so using floating point ends up as 4 bytes per pixel, and half float ends up as 2 bytes per pixel. For the purposes of making the pretty pictures, you just use the pixel value as a lookup into a 1 dimensional palette texture and render into a normal RGB(A) texture.
Anyway, it's all pretty simple in practice and you don't need to know an awful lot of OpenGL to pull it off. You should be able to find example code for all the bits and pieces. I actually did something similar last year as an exercise to learn OpenGL ES 2 shaders ... which turned out to be a totally stupid idea in the end because mobile GPUs suck at the kind of things you need for fractals :)
A random aside; it seems that by far the majority of my interaction with the GG site in the last couple of years has been commenting on your blogs. I am not sure why that is, but I find it mildly amusing.
T.
09/12/2012 (1:19 am)
Hey Frank,Just a couple of things for you to consider if you didn't already, with varying degrees of pedanticness.
When threading, if your tasks are all CPU bound (as they are with fractal generation) then there is no benefit to using more threads than there are CPU cores. Due to extra context switches, it can theoretically be slower to use too many threads. Querying the number of cores should be easy, but hardcoding to however many cores you have is easier if it's just for you. However, the actual numbers depend a lot on various factors and it may not translate to any kind of noticeable difference. It's still worth benchmarking, though.
Python seems a bit of a weird choice for generating fractals. I guess it's fine for experimenting, and if the generation times for your real world production uses are acceptable then that's fine too. However, if you want the fastest possible fractal generation you should probably look into using shaders to do it on the GPU. There should be lots of example code around for OpenGL and DirectX based fractal generation, and it shouldn't be too hard to get it running in Python.
Aside from speed, using OpenGL has some other advantages. If you use a floating point (or half float) texture for the fractal, then you have much more precision than you would have in an ordinary 8bit RGB texture. For displaying pretty pictures it makes no difference, but for terrain generation etc it may make a significant difference. You only need a single channel in the fractal texture so using floating point ends up as 4 bytes per pixel, and half float ends up as 2 bytes per pixel. For the purposes of making the pretty pictures, you just use the pixel value as a lookup into a 1 dimensional palette texture and render into a normal RGB(A) texture.
Anyway, it's all pretty simple in practice and you don't need to know an awful lot of OpenGL to pull it off. You should be able to find example code for all the bits and pieces. I actually did something similar last year as an exercise to learn OpenGL ES 2 shaders ... which turned out to be a totally stupid idea in the end because mobile GPUs suck at the kind of things you need for fractals :)
A random aside; it seems that by far the majority of my interaction with the GG site in the last couple of years has been commenting on your blogs. I am not sure why that is, but I find it mildly amusing.
T.
#5
My machine is a 6 core Phenom (AMD) just for reference.
I started with just 8 threads and for a small fractal I got like 2x speedup. So I pushed further to 16 and it dramatically dropped with a 4x speedup. All I can think is that because each core has hyperthreading that I get an effective 12 cores. When I set the threads to 32 I saw no speedup over 16. I am scaling the number of cores as a power of 2 because my data set resolutions are powers of 2 for simplicity.
Python is not actually doing the fractal generation. The numpy library written in C is doing the majority of the work. I also found some examples that I am looking into that use the GPU. The reason I am not doing it now because of the abstraction numpy provides simplifies the math. If I can run this in a separate thread then it will be "fast enough" for generating terrain data. I don't need to bog down the GPU and the user won't see the load.
All my fractals are being generated with U16 precision. That is the same precision used for the terrain. Look at the big_heat_map and notice the resolution for the generated color map is 65536 or 16 bit. I will have it output U16 PNGs for use with the terrain. I am taking advantage of the existing libraries to generate the nice views and pictures. I am not sure what format it is spitting out now, it is probably 8 bit RGBA. I have code to explicitly save to U16 which is what the data is in when generated.
My extension for T3D is Python. I am directly connected to the engine. So for creating tools it makes sense to test them in an independent Python app. Then in T3D I will have a function call the libraries I developed outside of T3D. It greatly speeds up development and integration is easy.
09/12/2012 (11:55 am)
@Tom,My machine is a 6 core Phenom (AMD) just for reference.
I started with just 8 threads and for a small fractal I got like 2x speedup. So I pushed further to 16 and it dramatically dropped with a 4x speedup. All I can think is that because each core has hyperthreading that I get an effective 12 cores. When I set the threads to 32 I saw no speedup over 16. I am scaling the number of cores as a power of 2 because my data set resolutions are powers of 2 for simplicity.
Python is not actually doing the fractal generation. The numpy library written in C is doing the majority of the work. I also found some examples that I am looking into that use the GPU. The reason I am not doing it now because of the abstraction numpy provides simplifies the math. If I can run this in a separate thread then it will be "fast enough" for generating terrain data. I don't need to bog down the GPU and the user won't see the load.
All my fractals are being generated with U16 precision. That is the same precision used for the terrain. Look at the big_heat_map and notice the resolution for the generated color map is 65536 or 16 bit. I will have it output U16 PNGs for use with the terrain. I am taking advantage of the existing libraries to generate the nice views and pictures. I am not sure what format it is spitting out now, it is probably 8 bit RGBA. I have code to explicitly save to U16 which is what the data is in when generated.
My extension for T3D is Python. I am directly connected to the engine. So for creating tools it makes sense to test them in an independent Python app. Then in T3D I will have a function call the libraries I developed outside of T3D. It greatly speeds up development and integration is easy.
#6
09/12/2012 (10:18 pm)
Uh oh, I made a mistake. The AMD 6 Core Phenom does NOT have hyperthreading. So I have no clue why my code ran faster when going past 8 threads to 16. There is something else going on here. Perhaps I was able to effectively keep the instruction queue full? Any CPU gurus out there that can enlighten the heathen masses?
#7
09/14/2012 (1:24 pm)
I am going to have a chance to test some things with the multi cores. I just ordered an 8 core system. My graphics artist is going to be getting my 6 core system. This should give me some insight as to what is really going on.
#8
I am testing some OpenCL code from within Python (PyOpenCL). I may be able to spread the load across all OpenCL compliant devices pretty easily. This should include any GPU, CPU, etc. This is fully supported by Intel, AMD, Nvidia, and ARM. I will report back in a new blog when I test this with the same fractal code. It looks like there is support for numpy arrays as well.
09/16/2012 (7:06 pm)
@Tom,I am testing some OpenCL code from within Python (PyOpenCL). I may be able to spread the load across all OpenCL compliant devices pretty easily. This should include any GPU, CPU, etc. This is fully supported by Intel, AMD, Nvidia, and ARM. I will report back in a new blog when I test this with the same fractal code. It looks like there is support for numpy arrays as well.

Torque 3D Owner Dayuppy
Phantom Games Development