I looked online for performing the combining several continuous probability distributions into one continuous probability distribution. This method is called Conflation, the method can be found in the following article: An Optimal Method for Consolidating Data from Different Experiments. In this article, I found out that it was better to perform Conflation instead of averaging to combine distributions.
From what I understood from the article is that equation performs by multiplying each probability density values from several probability distributions divided by the integration of the product of each probability density value from several probability distributions for continuous distribution while for the discrete distribution it is done by multiplying each probability density value from several probability distributions divided by the summation of each probability density value from several probability distributions. (Details can be found on page 5 of the article)
Say I have around 4 lists from, for example, 4 norm distributions, for example,
list_1 = [5, 8, 6, 2, 1]
list_2 = [2, 6, 1, 3, 8]
list_3 = [1, 9, 2, 7, 5]
list_4 = [3, 2, 4, 1, 6]
and implementing the Conflation the result list becomes,
Con_list = [2.73, 34.56, 3.69, 3.23, 12]
(Correct me if I am wrong)
how is it possible to implement both equations in the photo into python to get the Conflation of inputted PDF distribution?
I found stackflow question before regarding averaging list and the code was the following,
def average(l):
llen = len(l)
def divide(x):
return x / llen
# return map(divide, map(sum, zip(*l)))
return map(divide, map(sum, zip(l)))
I have been trying to recode this function to follow the equation above but I can't find a way to get conflated pdf for a continuous distribution.
Edit 1:
Based on the answer from @Josh Purtell, I rewrote the code, however, I keep on getting the following error message:
Error Message:
Traceback (most recent call last):
File "/tmp/sessions/c903d99d60f20c3b/main.py", line 72, in <module>
graph=conflate_pdf(domain, dists,lb,ub)
File "/tmp/sessions/c903d99d60f20c3b/main.py", line 58, in conflate_pdf
denom = quad(prod_pdf, lb, ub, args=(dists))[0]
File "/usr/local/lib/python3.6/dist-packages/scipy/integrate/quadpack.py", line 341, in quad
points)
File "/usr/local/lib/python3.6/dist-packages/scipy/integrate/quadpack.py", line 448, in _quad
return _quadpack._qagse(func,a,b,args,full_output,epsabs,epsrel,limit)
TypeError: only size-1 arrays can be converted to Python scalars
Code:
def prod_pdf(x,pdfs):
prod=np.ones(pdfs[0].shape[0])
for pdf in pdfs:
prod=prod*pdf
return prod
def conflate_pdf(x,dists,lb,ub):
denom = quad(prod_pdf, lb, ub, args=(dists))[0]
return prod_pdf(x,dists)/denom
lb=-10
ub=10
domain=np.arange(lb,ub,.01)
dist_1 = stats.norm.pdf(domain, 2,1)
dist_2 = stats.norm.pdf(domain, 2.5,1.5)
dist_3 = stats.norm.pdf(domain, 2.2,1.6)
dist_4 = stats.norm.pdf(domain, 2.4,1.3)
dist_5 = stats.norm.pdf(domain, 2.7,1.5)
dists=[dist_1, dist_2, dist_3, dist_4, dist_5]
graph=conflate_pdf(domain, dists,lb,ub)
from matplotlib import pyplot as plt
plt.plot(domain, dist_1)
plt.plot(domain, dist_2)
plt.plot(domain, dist_3)
plt.plot(domain, dist_4)
plt.plot(domain, dist_5)
plt.plot(domain,graph)
plt.xlabel("domain")
plt.ylabel("pdf")
plt.title("Conflated PDF")
plt.show()
From the code, what causes this error?
Edit 2:
I managed to rewrite the code to look into lists of distribution instead of getting the pdf in the product function in Edit 1, but still, I keep on having the same error in Edit 1.
Code:
def prod_pdf(x,pdfs):
prod=np.ones(np.array(pdfs)[0].shape)
for pdf in pdfs:
print(prod)
for c,y in enumerate(pdf):
prod[c]=prod[c]*y
print('final:', prod)
return prod
def conflate_pdf(x,dists,lb,ub):
denom = quad(prod_pdf, lb, ub, args=(dists))[0]
print('Denom: ',denom)
print('product pdf: ', prod_pdf(x,dists))
conflated_pdf=prod_pdf(x,dists)/denom
print(conflated_pdf)
return conflated_pdf
lb=-10
ub=10
domain=np.arange(lb,ub,.01)
dist_1 = st.norm.pdf(domain, 2,1)
dist_2 = st.norm.pdf(domain, 2.5,1.5)
dist_3 = st.norm.pdf(domain, 2.2,1.6)
dist_4 = st.norm.pdf(domain, 2.4,1.3)
dist_5 = st.norm.pdf(domain, 2.7,1.5)
from matplotlib import pyplot as plt
plt.plot(domain, dist_1, 'r')
plt.plot(domain, dist_2, 'g')
plt.plot(domain, dist_3, 'b')
plt.plot(domain, dist_4, 'y')
plt.plot(domain, dist_5, 'c')
dists=[dist_1, dist_2, dist_3, dist_4, dist_5]
graph=conflate_pdf(domain, dists,lb,ub)
plt.plot(domain,graph, 'm')
plt.xlabel("domain")
plt.ylabel("pdf")
plt.title("Conflated PDF")
plt.show()
Edit 3:
I tried to run the following code (based on an answer from @Josh Purtell), but, I keep on getting one variable it gets the whole array after product function and it produces the same error message regarding the size-1 array. See the following code with a portion of the output:
Code:
from scipy.integrate import quad
from scipy import stats
import numpy as np
def prod_pdf(x,dists):
p_pdf=1
print('Incoming Array:', p_pdf)
for dist in dists:
p_pdf=p_pdf*dist
print('final:', p_pdf)
return p_pd
def conflate_pdf(x,dists,lb,ub):
print('Input product pdf: ', prod_pdf(x,dists))
denom = quad(prod_pdf, lb, ub, args=(dists,))[0]
# denom = simps(prod_pdf)
# denom = nquad(func=(prod_pdf), ranges=([lb, ub]), args=(dists,))[0]
print('Denom: ', denom)
conflated_pdf=prod_pdf(x,dists)/denom
print('Conflated PDF: ', conflated_pdf)
return conflated_pdf
lb=-10
ub=10
domain=np.arange(lb,ub,.01)
dist_1 = st.norm.pdf(domain, 2,1)
dist_2 = st.norm.pdf(domain, 2.5,1.5)
dist_3 = st.norm.pdf(domain, 2.2,1.6)
dist_4 = st.norm.pdf(domain, 2.4,1.3)
dist_5 = st.norm.pdf(domain, 2.7,1.5)
from matplotlib import pyplot as plt
plt.xlabel("domain")
plt.ylabel("pdf")
plt.title("Conflated PDF")
plt.legend()
plt.plot(domain, dist_1, 'r', label='Dist. 1')
plt.plot(domain, dist_2, 'g', label='Dist. 2')
plt.plot(domain, dist_3, 'b', label='Dist. 3')
plt.plot(domain, dist_4, 'y', label='Dist. 4')
plt.plot(domain, dist_5, 'c', label='Dist. 5')
dists=[dist_1, dist_2, dist_3, dist_4, dist_5]
print('distribution list:
', dists)
graph=conflate_pdf(domain, dists,lb,ub)
plt.plot(domain,graph, 'm', label='Conflated Dist.')
plt.show()
Here is a small portion of the output:
Incoming Array: 1
final: [2.14638374e-32 2.41991991e-32 2.72804284e-32 ... 6.41980576e-15
5.92770938e-15 5.47278628e-15]
final: [4.75178372e-48 5.66328097e-48 6.74864868e-48 ... 7.03075979e-21
6.27970218e-21 5.60806584e-21]
final: [2.80912097e-61 3.51131870e-61 4.38823989e-61 ... 1.32670185e-26
1.14952951e-26 9.95834610e-27]
final: [1.51005552e-81 2.03116529e-81 2.73144352e-81 ... 1.76466623e-34
1.46198598e-34 1.21092834e-34]
final: [1.09076800e-97 1.55234627e-97 2.20861552e-97 ... 3.72095218e-40
2.98464396e-40 2.39335035e-40]
Input product pdf: [1.09076800e-97 1.55234627e-97 2.20861552e-97 ... 3.72095218e-40
2.98464396e-40 2.39335035e-40]
Incoming Array: 1
final: [2.14638374e-32 2.41991991e-32 2.72804284e-32 ... 6.41980576e-15
5.92770938e-15 5.47278628e-15]
final: [4.75178372e-48 5.66328097e-48 6.74864868e-48 ... 7.03075979e-21
6.27970218e-21 5.60806584e-21]
final: [2.80912097e-61 3.51131870e-61 4.38823989e-61 ... 1.32670185e-26
1.14952951e-26 9.95834610e-27]
final: [1.51005552e-81 2.03116529e-81 2.73144352e-81 ... 1.76466623e-34
1.46198598e-34 1.21092834e-34]
final: [1.09076800e-97 1.55234627e-97 2.20861552e-97 ... 3.72095218e-40
2.98464396e-40 2.39335035e-40]
I managed to look into the code to implement the same method in Edit 3, I edited the code where it gets the first variables from each distribution however, for the rest of the loop it keeps on printing the same values, it does not go to the next values in the lists and Conflated distribution is a single variable. See the following code with a portion of the output:
Code:
from scipy.integrate import quad
from scipy import stats
import numpy as np
def prod_pdf(x,dists):
p_pdf=1
print('Incoming Array:', p_pdf)
for c,dist in enumerate(dists):
p_pdf=p_pdf*dist[c]
print('final:', p_pdf)
return p_pdf
def conflate_pdf(x,dists,lb,ub):
print('Input product pdf: ', prod_pdf(x,dists))
denom = quad(prod_pdf, lb, ub, args=(dists,))[0]
# denom = simps(prod_pdf)
# denom = nquad(func=(prod_pdf), ranges=([lb, ub]), args=(dists,))[0]
print('Denom: ', denom)
conflated_pdf=prod_pdf(x,dists)/denom
print('Conflated PDF: ', conflated_pdf)
return conflated_pdf
lb=-10
ub=10
domain=np.arange(lb,ub,.01)
dist_1 = st.norm.pdf(domain, 2,1)
dist_2 = st.norm.pdf(domain, 2.5,1.5)
dist_3 = st.norm.pdf(domain, 2.2,1.6)
dist_4 = st.norm.pdf(domain, 2.4,1.3)
dist_5 = st.norm.pdf(domain, 2.7,1.5)
from matplotlib import pyplot as plt
plt.xlabel("domain")
plt.ylabel("pdf")
plt.title("Conflated PDF")
plt.legend()
plt.plot(domain, dist_1, 'r', label='Dist. 1')
plt.plot(domain, dist_2, 'g', label='Dist. 2')
plt.plot(domain, dist_3, 'b', label='Dist. 3')
plt.plot(domain, dist_4, 'y', label='Dist. 4')
plt.plot(domain, dist_5, 'c', label='Dist. 5')
dists=[dist_1, dist_2, dist_3, dist_4, dist_5]
print('distribution list:
', dists)
graph=conflate_pdf(domain, dists,lb,ub)
plt.plot(domain,graph, 'm', label='Conflated Dist.')
plt.show()
A portion of the output:
Incoming Array: 1
final: 2.1463837356630605e-32
final: 5.0231307782193034e-48
final: 3.266239495519432e-61
final: 2.187514996217005e-81
final: 1.979657878680375e-97
Incoming Array: 1
final: 2.1463837356630605e-32
final: 5.0231307782193034e-48
final: 3.266239495519432e-61
final: 2.187514996217005e-81
final: 1.979657878680375e-97
Denom: 3.95931575736075e-96
Incoming Array: 1
final: 2.1463837356630605e-32
final: 5.0231307782193034e-48
final: 3.266239495519432e-61
final: 2.187514996217005e-81
final: 1.979657878680375e-97
Conflated PDF: 0.049999999999999996
Edit 4:
I implemented the following code and it seems to work, also, I managed to sort out the problem with quad
it seems if I changed the quad
into fixed_quad
and normalise the pdf list. I will get the same result. Here is the following code:
import scipy.stats as st
import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler, Normalizer, normalize, StandardScaler
from scipy.integrate import quad, simps, quad_vec, nquad, cumula