Currently, wmfdata-python has a hive.load_csv function. It would nice to extend this to importing Parquet files into Hive-indexed HDFS table. This would also save the user from having to type the fieldspec manually, since, unlike a CSV, a Pandas dataframe is aware of its own field names and data types.
Description
Description
Related Objects
Related Objects
Event Timeline
Comment Actions
I've put up a draft pull request on GitHub. I still need to make some tweaks, so I haven't requested review yet.
Comment Actions
I plan to finish the pull request in July, after I return from sabbatical. But if someone else wants to take over while I'm gone, that's fine with me!
Comment Actions
The draft pull request is still there, but it seems unlikely that I'll be able to pick it back up in the near future.