Big-Data Developer/Lead at Redmond, WA

  • Redmond, WA
  • Full-time

Company Description



661 367 8000 * 209 

 Email : shravan 

Job Description


This is Shravan from KRG Technologies; We are looking for (Big Data) at Redmond, WA For the below mentioned job description. Kindly forward me your resume, rate and contact details for further process.

I also Kindly request you to forward this opportunity to your friends or colleagues; so that we can help someone who may be in search of a job or looking for a change.

Role : (Big Data)

Location : Redmond, WA

Duration : Fulltime


  • Need 2 onsite resources (MS internal team) with expertise in Big Data (EventHubs, Spark, Cassandra)
  • Development
  • Build the Event Hubs integration with Service Fabric micro services implementation. Streaming the processed files from blobs into EH for downstream processing.
  • Anonymized files (~1000 of them and to a size of ~GB) will be given as input
  • Service Fabric code portion will be provided.
  • Build the Spark processing reading off EventHubs, implementation in either Python or Scala would suffice.
  • Look at the caching needs; leverage .cache to retain appropriate results from Spark ‘Actions’ in Spark executors
  • Our team will evaluate a set of data store that would be a landing spot post Spark – Blobs being a required one. We will pick 1 or 2 from this list -- SQL DW, Azure SQL DB, Cassandra and DocumentDB being other candidate stores and we will have code snippets and/or guidance
  • Integration & Deployment
  • Integrate the items from above with completed items (Azure Data Factory with ARM provisioning, picking up from the ADF pipeline which lands files onto blobs)
  • Apply best practices for capacity planning, deployment for E2E
  • Integrate the deployment with existing set of tools and processes.
  • Testing
  • Build a unit test framework that can test each building block in isolation (ADF à Blobs, Blobs à Service Fabric, Service Fabric à EH, EH à Spark, Spark à <Data Store>
  • Build an E2E test environment with telemetry on latency, throughout with percentiles. *Leverage APM tools as appropriate

Thanks & Regards


Bigdata, Spark