A Day in the Life of a Data Engineer
A Day in the Life of a Data Engineer’ is presented hereunder in the form of a dramatic story with fictional characters, fictional companies, fictional data, and an imagined workplace. This story tries to capture as many of the tasks, activities, skills, and responsibilities of a Data Engineer. This story affords one a working knowledge of a typical day in the life of a Data Engineer, presented in a fun and thrilling way, in the backdrop of COVID 19. We invite you to sit back and enjoy the story!
Staff Bus - the Daily Thrill
It’s 09:00 am and Navya is on her way to the office in the company staff bus. A cacophony of noise filled the bus, everyone trying to speak at the same time; she found it both distracting and comforting. Although the distance is just 9.2km, it takes nearly 50 minutes to maneuver through the Bengaluru traffic to get to the office. There is some comfort, thought Navya, not riding or driving herself in this heavy traffic. Her PG roommate, Afreen, a biochemist, who uses an Ather 450 to work, always complains about the traffic, the pollution and the noise, not to mention the near accidental misses she’s experienced over the last year! Looking around the bus, she caught sight of Garg, as usual, with his mouth full, quaffing on a samosa, and there were several more in the packet in his hand – the samosa vendor lady who has a stall at the stop where he gets in always enjoys a good business day when Garg is her first customer – which, if truth be told, is every day! The drone and the gentle vibration of the bus as well as the comforting background noise, perhaps, made Navya sleepy, so thought Raj when he turned to speak to her and found her reclining on the seat with her eyes closed, but Navya was far from asleep – her thoughts were on the data migration issues that have been plaguing her since the last week…
The bus arrived punctually at 09:50 outside Shamazon India’s main entrance and everyone was in the office by 09:55. Navya did not go to the coffee machine like the rest, she immediately checked her mails – this was her usual morning ritual. And, as usual, she deleted most of the forwarded messages without reading them but took note of the RSS feed about a blog on Hypothesis Testing where the blogger provided scripts using R – she will read this later. She straightened the name plaque on her desk which read: Navya Riddhi – Senior Data Engineer. Navya then checked out yesterday’s customer activity on Shamazon’s data system – the two most important statistics that she has to validate every morning are the number of visitors who converted to buying customers and the value of transactions for the day; this data can cause great joy or immense pain in the stock market – so before the ‘data hits the fan’ it is Navya’s responsibility to double-check and validate them. She checked the fault logs for the night, confirmed that online transactions matched banking records, and that the revenue gap was covered by the EMI transactions. She also checked the logs from the distribution warehouse.
If all of these correspond, she would append her validation signature. Of the 2 72 346 visitors to the site, 1 48 367 had carried out transactions, a 54.48% conversion rate – that should bring a smile to the face of big boss Mr. Deff Seadoss, thought Navya with a wry smile. The value of transactions amounted to Rs 7.65 lakhs, a moderate cash flow day, but that’s how it goes in this industry – the Data Analysts and marketing team would have to pull up their socks to ascertain which products can spike sales tomorrow, thought Navya. The Chief of Analytics will also do a final validation before posting the info in the performance dashboard that is accessed only by a select few Senior Managers and Board Members. At 10:25, she made her way to the main boardroom on the 7th floor for the Analytics Team Leaders’ daily operations meeting. Sajid, the Head of Analytics at Shamazon, was there early, as usual – he didn’t like stragglers.
COVID 19 Virus Affects Data !
Big news of the meeting was that the country may go into lockdown. A new coronavirus that seems to have started in China is hitting the world and medical Data Analysts have pointed out the need for isolation to contain the spread of the virus in the absence of a cure. (Seems that a western leader has predicted that a vaccine and cure will be found quickly, hope this is true, thought Navya, but somehow not many people believed this leader!). The government already disallowed passengers who’ve been to China, Iran and some other countries, to enter India. Shamazon’s Data Analytics team predicts that FMCG sales will spike due to the impending lockdown where all social movements of people will be stopped, including trips to the local provision store. However, people would still need to eat, take medicines, etc, thus the sales of perishables and general FMC Goods are expected to spike. The Analytics team believed that many of the Shamazon staff would have to work from home; there was a need to check who had laptops, who would be given VPN access, and the levels of access given to each person. It was not known, as yet, the extent to which Shamazon’s Distribution Centres would be open. Distribution Management in the meantime would prepare SOPs that will promote workplace and home delivery hygiene practices, social distancing, including temperature tests of workers who have to go to work. Navya made a mental note of contacting her folks in Jaipur to check that they are well – her grandparents lived with them and it seemed that older folk may be prone to infections. There was one further complication that will arise from this impending pandemic – the data migration project would now be fast-tracked, so said Sajid. The Head of Analytics was prepared to bring in extra resources if needs be to assist Navya and her team – in fact, while the meeting was in progress Sajid’s secretary, Matilda, made a call to Aptus Data Labs, also based in Bengaluru, to see if they could provide good Data Engineers to speed up the project. It was known that Aptus conducted one of the better Data Science courses and had a reputation for turning out industry ready Data professionals; Sajid preferred the best.
When she returned to her cabin, Navya’s team of 4 – herself, Basil de Souza, Meenakshi Das, and Bittu Joshi, met in her cabin for their daily Production Meeting. It was Meenakshi’s birthday today. It seems Navya was the last one in her team to wish her; they hugged warmly. The team spent some time on production updates and also tersely discussed the issues that may arise from the expected spike in online commercial activity as well as the imperatives of working from home. The main discussions, however, were on the data migration project as well the trials and tribulations the team would experience if they continued this project under lockdown conditions – it required real-time coordination of each one’s work. They would also need to work closely with the Data Science team to figure out the predicted rise in sales and if they would need to temporarily scale up to add more servers to handle the traffic. Her team had set up servers to handle regular traffic, they would definitely need to figure out the magnitude of this predicted big spike in sales and balance the new infrastructure and operational costs against increased revenue – the Data Science guys should work this out. How would the new resources from Aptus Data Labs work with her team? Would they work professionally and in harmony with her team? She made a mental note to get more information from HR about Aptus Data Labs’ Data Engineering people.
Project Data Migration
A year ago Shamazon bought out popular fashion e-commerce company, Grab India, however, Grab India continued trading as a separate entity; this year the US Head Office decided that Grab India must be completely integrated into Shamazon, and thus began the huge task of migrating data from Grab India’s platform to Shamazon’s platform.
The best thing about Shamazon’s platform is that each project’s team has the freedom to choose whichever database is best suited for their development. Navya had set out in the plan of action that her team would essentially need to analyze Grab India’s current data architecture, figure out its level of optimality, and whether it could be scaled to fit Shamazon’s current optimization levels. Now that an ambitious sales target is in the works, this would have to be re-looked at and restructured accordingly. Grab India’s data was based on Oracle DW for warehousing; while it is a stable and resilient platform, the preferred DB around the 9th floor was PostgreSQL for its advanced JSON support. PostgreSQL’s approach to multi-version concurrency control with immunity to dirty reads, and a bias for open source technology allowed Shamazon developers to toy with their own plugins to customize their usage and improve the performance of the DB. After discussing this with Sajid, Navya took the decision that for the migration she would need to isolate and purge temporary backup tables from past maintenance and to use transaction scripts to make changes to handle the conversions to PostgreSQL’s rich data types. To handle the different functions and input parameters, she decided that ora2pg would be the ideal tool for this transformation. This entire workflow will have to be developed and tested with simulations before being deployed for transition with none to minimal downtime results. For the ETL process, experience told Navya that it would be best to start from the most central table in the snowflake, and then work outwards.
Jeera Rice & Daal Fry
With the mounting pressures and the impending coronavirus conundrum in her mind, Navya decided to lunch in the cafeteria. The Potluck on Five (the cafeteria was on the fifth floor) was buzzing like a bee by the time she got there. Potluck served some of the best fare among corporates in Bengaluru – but she opted for simple Jeera Rice and Daal Fry today to calm the inner turmoil. Looking at the others lunching she couldn’t help wondering if Potluck on Five, this awesome cafeteria, will still serve such great food to the working professionals if and when the lockdown is implemented; most of them came from other states and therefore relied on this cafe.
Navya made the call home after her lunch; mom had prepared her favourite dish today, mushroom lasagna, Papa was at work, he had apparently mentioned to mom about the uncertainties with regards to the cobra virus (he called it cobra virus as, in his opinion, it was as deadly as the venomous snake; some people were also, quite judgementally, calling it the China virus) and that he and his work friends discussed stocking up on rice, atta, oil, dals, and other essential kitchen items. Navya told her mom not to panic, provisions will still be available – and that if they run short of anything that she will get it delivered from an online vendor. Arjun, her little brother, was already gone to the cricket academy – that boy and his cricket! While it was great that he was passionate about the sport, he often used it as an excuse to procrastinate on school work – if this virus can stop work, who knows it may just get him to focus a bit more on studies, she thought! She asked them to be careful at home and especially vigilant with Dada and Dadi.
Having attended to the urgent routines, Navya decided to catch up on articles on new databases as well as the updates on the new version of ElasticSearch. It does feel kinda annoying that upgrades hit them so frequently, but there is also the comfort of knowing that because of its wide use, bugs discovered during real-time use, and feature requests, the developers are quickly fixing, testing, and deploying upgrades. With the data migration project in mind, she went through some of the promotional and forum mails and before you knew it, afternoon tea had arrived. The entire 2nd floor surrounded Meenakshi’s desk. Meenakshi had ordered Death by Chocolate from Chef Baker’s as well as Rajasthani Malai Kulfi – yup there was a delectable kulfi for everyone. Navya did the honours of feeding the first piece of cake to Meenu; there was the usual scream from everyone that she should mess Meenu’s face with cake – nope, Navya would not deface her good friend.
Collaboration the Key to Success
In preparation for the afternoon meeting with her team, Navya checked some new posts on the data engineering subreddit, especially with others’ experiences in migrating from Oracle DW. She also had some conversations on the related Discord channels – one thread with a Data Engineer with 12 years of experience, he was one of those with several insightful stories from the era when data infrastructure was handled by software engineers. When the team arrived, they first discussed the documentation prepared by Grab India’s team, i.e., the data pipelines in action for ingestion, retrieval, recommendations, and analysis. Bittu, who was tasked to study and summarise it, explained to the team the way the data was being utilised by Grab India. Basil questioned the schema and the metadata – it was his opinion that data architecture had to be changed to suit Shamazon’s applications. A long discussion ensued on transforming the database to make it PostgreSQL compatible. Bittu wondered whether further transformations were needed to conform with the additional applications utilising the data in Shamazon. When Navya asked Meenakshi about her views on the performance criteria of Grab India’s architecture, her view was that optimisations would be needed to handle the high volume of transactions expected as well as some additional fields that may be needed for the in-depth analyses and real-time reporting. This would then align the DW in line with the Data Science teams’ regular analyses – all part of the data profiling and cleansing process. This view concurred with Navya’s own views – she was glad that an overdose of Death by Chocolate didn’t lead to the death of Meenu’s reasoning as yet! It clearly wasn’t a straightforward ETL task. A key item that emerged from the meeting was the possibility of having common customers in the two databases. It was Basil’s suggestion that perhaps someone from the marketing team should join the daily project meeting so that a marketing strategy could be formed to promote Grab’s special products as that of Shamazon’s; the reasoning behind this strategy is that Shamazon should not lose any common customers loyal to Grab India. From the perspective of the Data Engineers, the solution was not seen as complex – customer details will be joined on perceived common data such as email, phone number, and other criteria – it will have to be tested on at least 10% of this data set to ascertain the best approach for the join before deploying on the entire customer base. They agreed on converting to the considerably larger customer metadata store that Shamazon had in place. They would also need to take a good look at the APIs channeling the data – whether they would need to be overhauled or simply modified to suit the new data architecture. A Data Engineer should, in Navya’s opinion, always have a basic understanding of software development and REST APIs – she remembered that she must check out the credentials of the temporary Data Engineers whom Sajid had recommended; after all, Sajid would not be the one working with the new staff, she was answerable for the success of the migration project. Basil and Meenakshi would put together the detailed plan of action for the Data Migration project, outlining the broad tasks to be performed, broken up into workable portions for each team member, and will submit the plan to Navya in 2 days. It would be Navya’s duty to then allocate tasks and responsibilities – she would do this after speaking to the new hirees for this project. For now, Navya also needed to give HR a count of the additional resources they needed. She first called Aptus Data Labs after getting the contact number from Matilda and spoke to one Sthita Pragnya there. The nice lady of Aptus went through some of the accomplishments of the available Data Engineers; Navya requested Sthita Pragnya to send three of them to come see her at Shamazon tomorrow; she would select 2 after satisfying herself that they had the wherewithal to work harmoniously with her team. She believed that 2 extra resources would be sufficient to complete the project. The formalities of completing the company requisition for new staff was quickly done and forwarded to Sajid for approval. HR can append the bio details of the hirees.
Lockdown a Knockdown ?
Navya was the last one to get into the bus that evening; she sat next to Meenakshi. It was one of those days, she thought. There was a new driver and he was wearing a face mask – seems the usual driver was not well and the Transport Manager didn’t want to take the chance of exposing staff to anyone who may be infected. On the way home almost everyone was glued to the phone screen – it was early news and a lockdown was finally being announced by the PM; whew this was going to cause quite a bit of upheaval. Everyone was caught up in their own thoughts. Meenakshi cut across Navya’s thoughts and persuaded her to join her for a bit of chill session at CCD. They got off at the Electronic City CCD – they needed the time out! The place was only about half full; little did anyone there realise that this would be the last time that they would be able to chill out at CCD, or at any place for that matter, for a long time! The two Shamazon girls, however, knew that their kind of work, Data Science and Big Data Analytics, will be one of the important tools, together with medical science, to fight this disease if it ever gets to the level of a pandemic!