Evaluating bids

Google Cloud Dataflow Pipeline

Published on the April 03, 2021 in IT & Programming

About this project

Open

I have a very specific requirement of reading some hundreds of millions of plain text files from a GCS bucket into Cloud PubSub using Cloud Dataflow. I need the whole contents of the file to be in a single message.

Also, the pubsub message should also contain the complete path of the GCS Object and the "created time" of the GCS Object. The PubSub produced message format should be similar to this:

{
"gcsCreatedTime": "Apr 1, 2021, 12:34:21 PM",
"gcsPath": "gs://bucketName/xxx/yyy/zzz/file.xml",
"fileStringContent": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
}

Whatever solution provided can be in java or python. It doesn't matter, as long as it works.


Also, streaming is preferred but it can be batch.

Category IT & Programming
Subcategory Other
Project size Small
Is this a project or a position? Project
I currently have I have specifications
Required availability As needed
API Integrations Other (Other APIs)

Delivery term: Not specified

Skills needed

Other projects posted by O. C. F. J.