'Conjoint analysis in Python using a Max Diff sample, creating a score and ranking
I am learning how to do some conjoint analysis using a max diff questionnaire. I created some dummy content with some code (probably more complex code than needed, but here it is):
test = pd.DataFrame(
np.arange(20).reshape(
20,1)).applymap(
lambda x: np.random.choice(
list(['orange', 'apple', 'banana']), size=2,replace=False))
test.columns = ['groups']
test_1 = pd.DataFrame(test['groups'].to_list(), columns=['most', 'least'])
test_2 = pd.DataFrame(
np.arange(20).reshape(
20,1)).applymap(
lambda x: np.random.choice(
list(['banana', 'pear', 'peach']), size=2,replace=False))
test_2.columns = ['groups']
test_2 = pd.DataFrame(test_2['groups'].to_list(), columns=['most', 'least'])
test_3 = pd.DataFrame(
np.arange(20).reshape(
20,1)).applymap(
lambda x: np.random.choice(
list(['apple', 'banana', 'pear']), size=2,replace=False))
test_3.columns = ['groups']
test_3 = pd.DataFrame(test_3['groups'].to_list(), columns=['most_3', 'least_3'])
df = test_1.join(test_2, lsuffix='_1', rsuffix='_2').join(test_3, lsuffix='_3')
The code should you give a dataframe with data that looks like this:
| user | most_1 | least_1 | most_2 | least_2 | most_3 | least_3 |
|---|---|---|---|---|---|---|
| 0 | orange | banana | pear | peach | apple | banana |
| 1 | orange | banana | peach | banana | pear | banana |
| 2 | orange | banana | pear | peach | apple | banana |
| 3 | orange | banana | pear | banana | apple | banana |
| 4 | banana | apple | peach | pear | banana | apple |
| 5 | orange | banana | pear | banana | banana | apple |
| 6 | banana | orange | pear | peach | banana | pear |
| 7 | apple | banana | peach | banana | pear | banana |
| 8 | orange | apple | pear | banana | banana | pear |
| 9 | orange | apple | banana | pear | apple | banana |
| 10 | apple | banana | pear | banana | pear | apple |
| 11 | apple | banana | pear | banana | apple | banana |
| 12 | apple | banana | banana | peach | banana | apple |
| 13 | orange | banana | peach | pear | pear | banana |
| 14 | apple | banana | banana | peach | apple | banana |
| 15 | apple | banana | peach | banana | banana | apple |
| 16 | apple | orange | peach | banana | pear | banana |
| 17 | banana | apple | pear | banana | apple | pear |
| 18 | apple | orange | peach | banana | pear | banana |
| 19 | banana | apple | peach | banana | pear | apple |
So in this example, a person should choose their favorite fruit between an orange, apple, and banana. The next question would be banana, peach, and pear. Final one is apple, banana, and pear. It includes more questions, but I stopped at 3 for the example.
I am trying to create new columns for each fruit in the dataframe. Each fruit will have a point column (+1 if in most, -1 in least, which I have the code for, see below) and a rank column (most points equal 1, least equals 5). In the rank column, if two fruits have the same point value, it will compare them when they were in the same group. For example, if apple and banana have the same score, the fruit that did best when they were in the same group (1 and 3) will be ranked higher. The idea would be I can now see which fruit a specific user prefers and in what order.
So the new table will have this added:
|user | orange_pt | banana_pt | apple_pt | pear_pt | peach_pt | orange_rank | banana_rank | apple_rank | pear_rank | peach_rank |:------|:------:|:------:|:------:|:------:| |:------:|:------:|:------:|:------:|:------:|:-----:| 0|1|-2|1|1|-1|1|5|2|3|4
(not sure why the formatting is not working here)
This should be repeated for each user (row) in the dataframe.
Here is the code for scoring:
df.join((df.loc[:,df.columns.str.contains('most')] \
.apply(pd.value_counts, axis=1).fillna(0).astype(int) - \
df.loc[:,df.columns.str.contains('least')] \
.apply(pd.value_counts,axis=1).fillna(0). \
astype(int)).add_suffix('_score'))
So I really just need to figure out a way to create a ranking. Any help greatly appreciated!
THANKS!!!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
