I'm not a Python expert, but this is the quickest way I could find to get the mean of the array of size n of random values that I wanted. I tried other ways before, but when I refactored it, I got into a functional style of programming and wrote it as a mathematical function. That's why I used a lambda function, because it's an anonymous function, as far as I can tell. Again, I'm not an expert in theoretical computer science either. I just know I used the timeit module and found this function to be relatively quick. I tried the statistics module in Python 3.8 and it was super slow.
I tried doing everything in Numpy, but it was slower than when I used builtin functions like sum. Instead of importing mean functions from different modules, it was better just to sum the array of size n and divide by n to get the mean or expected value. Basically, I create an array of size n of what are essentially probabilities, real numbers, float numbers in Python, between 0 and 1. Then I run many trials, like say 1 million, and get the mean (average value) and it is usually 0.50 or else 0.4999.
I was really wondering about code optimization in Python. As I said, I'm not an expert in Python or in theoretical computer science, and I'm certainly not a mathematician. I just needed a way to come up with a large number of probabilities in a list (an array, a one-dimensional vector). I found I could do this easily with a list comprehension, i.e. [random.random() for i in range(n)], in Python, using the random module. I tried different ways of calculating those random probabilities. My thesis was proved, though. The mean after a million trials is 0.50 or 0.4999 or else 0.5001. Anyway it turns out that one should try to use builtin functions when possible, because they are usually faster. It's like writing the function in C, almost. There is no interpretation, is what I understood from my quick overview of the subject.
I was starting to look into the internals of Python to try to see why builtin functions would be faster. This is the best summary I could find so far:
Addendum:
I did a little more research. After looking into profiling and code optimization, I got to thinking of implementing my function in othe programming languages, with the idea that maybe it would be faster in Fortran or C or whatnot. I'm mostly just familiar with Python, but I was able to write the function, in what I think is working code, in both the R language and in GNU Octave:
The file is named avg_probs.m with my function in GNU Octave code. Here it is in the R programming language.
Those were a few of the other languages that I was able to "translate" my function into, my "average of probabilities" function, as I am calling it now, or avg_probs().
I tried doing everything in Numpy, but it was slower than when I used builtin functions like sum. Instead of importing mean functions from different modules, it was better just to sum the array of size n and divide by n to get the mean or expected value. Basically, I create an array of size n of what are essentially probabilities, real numbers, float numbers in Python, between 0 and 1. Then I run many trials, like say 1 million, and get the mean (average value) and it is usually 0.50 or else 0.4999.
I was really wondering about code optimization in Python. As I said, I'm not an expert in Python or in theoretical computer science, and I'm certainly not a mathematician. I just needed a way to come up with a large number of probabilities in a list (an array, a one-dimensional vector). I found I could do this easily with a list comprehension, i.e. [random.random() for i in range(n)], in Python, using the random module. I tried different ways of calculating those random probabilities. My thesis was proved, though. The mean after a million trials is 0.50 or 0.4999 or else 0.5001. Anyway it turns out that one should try to use builtin functions when possible, because they are usually faster. It's like writing the function in C, almost. There is no interpretation, is what I understood from my quick overview of the subject.
I was starting to look into the internals of Python to try to see why builtin functions would be faster. This is the best summary I could find so far:
Use Built-in Data TypesThis one is pretty obvious. Built-in data types are very fast, especially in comparison to our custom types like trees or linked lists. That’s mainly because the built-ins are implemented in C, which we can’t really match in speed when coding in Python. - Making Python Programs Blazingly Fast
Addendum:
"A simple rule of thumb (but one you must back up using profiling!) is that more lines of bytecode will execute more slowly than fewer equivalent lines of bytecode that use built-in functions." - p.55, High Performance Python: Practical Performant Programming for Humans by Ian Ozsvald and Micha Gorelick* * *
I did a little more research. After looking into profiling and code optimization, I got to thinking of implementing my function in othe programming languages, with the idea that maybe it would be faster in Fortran or C or whatnot. I'm mostly just familiar with Python, but I was able to write the function, in what I think is working code, in both the R language and in GNU Octave:
The file is named avg_probs.m with my function in GNU Octave code. Here it is in the R programming language.
Those were a few of the other languages that I was able to "translate" my function into, my "average of probabilities" function, as I am calling it now, or avg_probs().
No comments:
Post a Comment