'Pyspark: Dynamically update columns position of a dataframe according to other dataframe
I have a requirement to change column positions frequently. instead of changing the code i created a temporary dataframe Index_df. here i will update the column positions and it should reflect on actual dataframe on which the changes should perform.
sample_df
F_cDc,F_NHY,F_XUI,F_NMY,P_cDc,P_NHY,P_XUI,P_NMY
415 258 854 245 478 278 874 235
405 197 234 456 567 188 108 267
315 458 054 375 898 978 677 134
Index_df
col position
F_cDc,1
F_NHY,3
F_XUI,5
F_NMY,7
P_cDc,2
P_NHY,4
P_XUI,6
P_NMY,8
here according to the index_df,sample_df should change.
Expected output:
F_cDc,P_cDc,F_NHY,P_NHY,F_XUI,P_XUI,F_NMY,P_NMY
415 478 258 278 854 874 245 235
405 567 197 188 234 108 456 267
315 898 458 978 054 677 375 134
here column positions are changed according to the positions i have updated in Index_df
I could do sample_df.select("<column order>") but i have more than 70 columns. Technically which is not a best way to deal.
Solution 1:[1]
Add external jars to local .m2 (for local development)
This approach is not distributable. It assumes just putting jars to local .m2 and nothing more.
For adding jars to local maven repository:
mvn install:install-file -Dfile=<path-to-file> -DgroupId=<group-id> -DartifactId=<artifact-id> -Dversion=<version>
<path-to-file> - is path to jar
Create a new maven repo and distribute it within a project
This approach assumes creating a new maven repository, which will include only external jars. Then this repository placed to the project root, added to git and referented by the project pom.
So anyone, who will download the project will have all maven dependencies in place without extra actions.
Adding jar to the new maven repo:
mvn deploy:deploy-file -Dfile=<path-to-file> -DgroupId=<group-id> -DartifactId=<artifact-id> -Dversion=<version> -Dpackaging=jar -Durl=file:./.m2/repository/ -DrepositoryId=project-internal -DupdateReleaseInfo=true
Then reference the repo in your pom:
<repositories>
...
<repository>
<id>project-internal</id>
<url>file:///${project.basedir}/.m2/repository</url>
</repository>
</repositories>
Reference
https://www.google.com/amp/s/roufid.com/3-ways-to-add-local-jar-to-maven-project/amp/
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
