'differenc between using panda.drop_duplicate or value_count on whole frame or one column

I am a new python user just for finish the homework. But I am willing to dig deeper when I meet questions. Ok the problem is from professor's sample code for data cleaning. He use drop.duplicates() and value_counts to check unique value of a frame, here are his codes:

spyq['sym_root'].value_counts() #method1
 spyq['date'].drop_duplicates() #method2

Here is the output:

SPY    7762857 #method1
0    20100506  #method2

I use spyq.shape() to help you understand the spyq dataframe :

spyq.shape #out put is (7762857, 9)

the spqy is dataframe contains trading history for spy500 in one day when is 05/06/2010. Ok after I see this, I wonder why he specify a column'date" or :'sym_root"; why he dont just use the whole spyq.drop_dupilicates() or spyq.value_counts(), so I have a try:

spyq.value_counts()
spyq.drop_duplicates()

Both output is (6993487, 9) The row has decreased! but from professor's sample code, there is no duplicated row existed because the row number from method 1 's output is exactly the same as the row number from spyq.shape! I am so confused why output of whole dataframe:spyq.drop_duplicates() is not same as spyq['column'].drop_duplicated() when there is no repeat value! I try to use

spyq.loc[spyq.drop_duplicates()] 

to see what have dropped but it is error. Can any one kindly help me? I know my question is kind of stupid but I just want to figure it out and I want to learn python from most fundmental part not just learn some code to finish homework. Thanks!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source