In our latest feature for the People in Research series at Kadence International, we’re thrilled to present an insightful conversation with Vincent “Vinny” Yabor, Junior Data Scientist at our Americas office.
Vinny’s story is not your typical career trajectory. His journey from a prospective career in Math Education to the dynamic world of data science is a fascinating tale of adaptability, passion, and discovery. In this interview, we delve into the pivotal moments and decisions that shaped his path to becoming a key player in our data science team.
Learn how Vinny navigated challenges and opportunities in his transition to data science as he shares his unique perspectives on working in a global market research agency and how his diverse experiences have contributed to his current role.
Additionally, Vinny shares some pearls of wisdom for those aspiring to enter the field of data science, reflecting on the importance of continuous learning and adaptability in a rapidly evolving industry.
Curious about the journey from academia to the forefront of market research? Vinny’s insights are valuable for everyone, from industry veterans to those just starting in the profession.
Here’s an engaging and enlightening conversation that sheds light on the human side of data science in market research.
What inspired you to pursue a career in data science, and what was your biggest challenge in transitioning into this field?
Throughout high school and undergrad, I was in a constant state of motion. I’d had my head down in the books so much that I never actually took the time to figure out what I wanted to do for a career. I was a good tutor, so I pursued a Master’s in Math Education at Stony Brook University.
In 2020, I finally had time to slow down once the pandemic hit. This allowed me to explore and discover my passions. That’s when I found data science. I was amazed at how much could be predicted and explained with data, from modelling climate change to predicting if someone would show up for a scheduled appointment or not. It was all so fascinating.
I already had the math foundation set for data science, so all that was left to learn were coding and machine learning concepts. I dropped out of my Math Education program and applied to the Statistics and Data Science program at Stony Brook. I also earned many online certificates in Data Science in the meantime to supplement my education.
While in grad school, I worked as a Data Analyst at an email marketing company called Alchemy Worx. This was my first taste of real-world data. I left Alchemy Worx when I graduated and took on some personal projects and more online courses and certificates. Finally, I ended up here at Kadence as a Data Scientist in May of 2022!
My biggest challenge transitioning into data science was that, at first, I had a lot of self-doubt. I questioned whether I truly belonged in this field. There were so many machine-learning concepts, and I quickly became overwhelmed at various points in my education. But data science is extremely broad, and my education in this field will likely never end. I became more motivated and confident once I came to terms with that. Now I can proudly say I’ve made it!
Can you describe a specific project at Kadence International where you felt particularly challenged and how you overcame that challenge?
One of my primary functions as a Data Scientist here at Kadence is to monitor and maintain data integrity. When we work with clients to conduct on-site studies, we collect a lot of participant data. Our quota targets for each study directly rely on this crucial data. However, there has been a pattern of one data point being inconsistent between scheduled participants and what was reported on-site during a given study. This is the Fitzpatrick skin score. It is a scale used in dermatological research that ranges from one through six and classifies how a person’s skin reacts to UV light. In this scale, there is a direct correlation between skin tone and UV reactiveness, which simplifies determining one’s own score. Various reference scales were sent to prospective participants across multiple studies, with some scales more representative of true skin tones than others. This contributed to inconsistencies when a person was evaluated on-site.
Further, many individuals would classify themselves with a score that does not accurately correspond to their skin tone or reactiveness. As a result, people’s scores would be changed during the study, and quota counts would shift considerably. I initially toyed with analyzing hand images from previous studies to predict a skin score in future images. But that ultimately didn’t work out due to the inconsistency and inaccuracy of manually determining a skin score in the first place. I felt like I just couldn’t crack it.
After spending quite some time thinking and researching how to mitigate inconsistencies, I came across a study from the Florida Institute of Technology and the University of Notre Dame. The study, “Analysis of Manual and Automated Skin Tone Assignments for Face Recognition Applications,” aimed to develop a way to eliminate inconsistent Fitzpatrick skin tone ratings between various human raters. I adapted some of their methods into Python code and did some internal testing of my own.
I found that, when given a standardised Fitzpatrick reference scale, images showing people’s skin tone, and various raters, the “eye test” to determine an individual’s skin score is unreliable. The objective algorithmic approach, which is based on a type of image processing, has seen the most consistency thus far. Ensuring data integrity and quota consistency through a difficult data point has involved substantial research, trial and error, and persistence.
How has your background in Statistics and Pure Mathematics influenced your approach to data science, especially in the projects you handle at Kadence International?
Data science mainly involves coding. But the code is based on mathematical concepts crucial to understanding what’s happening under the hood. If I’m performing an analysis, it helps to know how a metric is determined. This is most important when it comes to decision-making and the explainability of my methods.
A common misconception is that math is all about numbers. However, in practice, math is everywhere. It’s the logic that goes into decision-making. It’s the abstract concept that a mathematical formula is derived from. It’s the difference between anecdotal evidence and thorough research.
At Kadence, I consider all logical avenues before I follow through with a project. For example, a breadth of machine learning models can be applied to a given task. One model may fit a dataset better than another. This can be seen by evaluating various metrics throughout the testing stage and deciding based on the metric values.
Time and resources can even be saved if I choose not to consider some of the possible methods at my disposal if they conceptually do not make sense to use. I often remind myself of a famous quote from mathematician George Box. “All models are wrong, but some are useful.” This reminds me that there will be no perfect results or perfect models. Analyses and models are only as good as the data and the scientist.
Kadence International emphasises one-to-one connections in its approach. How do you ensure your data-driven projects maintain a personal touch or consider the human aspect?
During on-site studies, whenever we receive feedback from a recruiter or client on our data collection and scheduling sheets, I do my best to take that feedback into account. Since recruiters constantly use the spreadsheets we build, we must consider having a friendly and intuitive user interface. Our work does not end after building and implementing these platforms, though. I like to continually check how things are running and make any adjustments accordingly while maintaining contact with project managers.
At Kadence, you’ve worked extensively with AWS Lambda, Google Sheets API, and the cv2 Python library, among others. Which technologies do you find most valuable in your role and why?
I live and breathe Python, the Google Sheets API, and AWS Lambda in my role. These together are the most valuable tech stack at my disposal. Whenever an on-site study is about to commence, I assist the rest of the data team in building complex Google Sheets for data collection and scheduling. This involves using the Sheets API in Python to make the spreadsheets dynamic and functional, Amazon Lambda for cloud deployment and consistent code running, and Google Apps Script for even more spreadsheet functionality. Depending on how many recruiters we partner with, we could have several spreadsheets that need to be interconnected per study and per site. We track scheduled and cancelled appointments, calendars and open time slots for each study, quota counts, on-site participation, and more. Further, most studies vary in what’s exactly requested of us. So, there is always new code to write as we improve upon what we had for a previous study. This rigorous and iterative approach helps ensure that scheduling can be done efficiently and that metrics are accurate throughout a study.
You mentioned the development of machine learning models for projecting participant attendance and demographic counts. How do you ensure the accuracy and fairness of these models?
Projecting participant attendance can be tricky, and it’s easy to overlook the fairness aspect of it. When building models, I consider any variables or information that influence the model and drop the other variables. Further, some variables may go into a model that influences the model output but don’t provide any real explanation. This is where correlation does not necessarily imply causation. The attendance predictor considers past cancellation rates and past show rates as the most influential variables, among others.
This makes sense because if someone has a good track record of attending appointments in past studies, they will be less likely to cancel in the future.
On the other hand, if someone cancels or reschedules often, they are more likely to cancel again in the future. The trickiest part is when we have new participants who haven’t cancelled or shown up in the past. This is where other factors like time of day and day of week for other people’s past appointments come into play. But there are certainly improvements to be made there. As far as the accuracy of the model, a common practice is to split the data into a training set and a test set. Where the model learns from the training set, and accuracy is evaluated on the test set. From a high level, if the accuracy score, among other metrics, is optimal, then the model should be good enough to use.
What role did your experience at Alchemy Worx, especially with email marketing and segmentations, play in preparing you for your current position?
My role as a data analyst at Alchemy Worx enabled me to work on my foundational skills in statistics and programming before transitioning to a data science role here at Kadence. By extracting insights from our database of client email marketing data, I could find which ad campaigns worked best toward given sets of demographics. Based on these findings, I provided suggestions for future segmentations of email recipients who’d receive certain ads. My experience also involved coming up with ways to innovate our regular processes. For example, I developed a web application that could perform a statistical hypothesis test to determine if there was a significant difference between two sets of email ad campaign results. My drive and passion for innovation have stuck with me since then and are evident in my current role at Kadence. Further, Alchemy Worx exposed me to a lot of data exploration and data cleaning. These skills are invaluable for any data analyst or data scientist, and I’m grateful that I was able to hone those skills during my time there.
How do you think your teaching experience at Air Tutors influenced your ability to communicate complex data findings to stakeholders or non-technical audiences?
I worked with students of various educational backgrounds and proficiencies at Air Tutors. In one session, I’d find myself explaining advanced statistics, and in the next session, I’d be going over something tamer, like algebra. I adapted my explanations to audiences with different backgrounds. Many of the students I worked with were missing foundational math skills that preceded what they were even there for in the first place. For example, some students may have taken statistics but struggled with basic probabilities. Sometimes, stepping back in my approach felt necessary to catch them up on the skills they were missing.
Breaking down concepts that felt very complex to these students into more simple terms has been a transferable skill that I use as a data scientist when speaking to a non-technical audience. Over time, those experiences helped me realise that, when communicating complex findings, I should adapt my explanations to my audience’s level.
What are your thoughts on balancing theory and practice in data science, especially given your strong academic background?
Theory is a necessary precursor to practice. Knowing the underlying concepts beyond code is essential since it can provide context and insight into one’s work. Balancing theory with practical applications in data science may also depend on the situation. For example, complex algorithms have a higher need for understanding theoretical concepts. Otherwise, you may end up misusing an algorithm. On the other hand, something more practical, such as generating a chart or making a table, may only require surface-level knowledge of the underlying concepts. There is also the importance of considering time and resource constraints. If time and resources are tight, then it’s beneficial to prioritise application over theory.
Kadence International, being a global market research agency, must have a diverse work culture. How does this diversity influence your work, and what have you learned from collaborating with international teams?
A diverse work culture means teams are more innovative and creative due to all the various backgrounds and perspectives. I’ve become a better problem solver since joining Kadence, partly due to working with international teams.
If I’m discussing something technical with someone whose first language is not the same as mine, it’s easy for us to get lost in terminology. I always see that as a learning opportunity to expand my cultural and technical knowledge. There are also nuances in datasets between different regions. For example, UK English spells many words slightly differently than US English. And for dates, the month and day are switched in some places. These are things I’ve learned to consider when working with my datasets. Being part of a diverse company has helped me tremendously.
Kadence’s work-from-home policy offers some flexibility. How do you manage your work-life balance, especially when working remotely?
Working remotely gives me some much-needed flexibility. The lack of a commute means I save money and have more time for my family. I also feel less stressed than in person since I can open my laptop later in the day if I can’t finish something by 5. This makes it easier for me to meet deadlines without making sacrifices at home. It’s also easy to take short breaks during the day to clear my head or stretch. This helps with my productivity and reduces burnout. And if I ever get stressed, I can play with my cat for a few minutes to recharge!
While being at home, it’s still crucial to establish a routine and incorporate time management. Since it’s easy for the line between one’s professional life and personal life to get blurred while working from home, having clear boundaries is essential. I have a dedicated workspace at home and regular start and end times as in an office. So, I’m in full work mode while I’m at my laptop during work hours. And I’m off the clock when I’m away from that space outside of work hours.
Outside of work, what do you enjoy doing in your free time?
My favourite thing to do is to spend time with my girlfriend of nearly three years. This usually involves nature walks, games, movies, cooking together, and playing with our new kitten Rigby! I also love good video games like Zelda, Spider-Man, Star Wars, and more. Further, I like to practice new skills in AI and machine learning.
Given your extensive skills and credentials, what advice would you give to someone aspiring to become a data scientist in today’s job market?
Although the job market can be unforgiving, my advice would be never to stop learning and never stop working on personal projects. A solid Python portfolio is key to breaking into this field. And even though most jobs require a Bachelor’s degree, a Master’s will help you stand out. Further, if you’re debating whether to take an online certification or course, do it! Plenty of great programs exist on websites like Coursera and edX. I personally loved the MicroMasters program in Data Science from UC San Diego on edX! Some courses are a bit math-heavy, but having a mathematical foundation is essential for an aspiring Data Scientist. This is the order in which I learned:
1. Math and Statistics
2. SQL querying and Database Management
3. Basic Python Programming
4. Data Manipulation/Exploration and Data Visualisation
5. Hands-on machine learning
With sufficient rigorous education and the ability to showcase your skills through flashy web apps, dashboards, and machine learning models, becoming a Data Scientist should be achievable!
Get regular insights
Keep up to date with the latest insights from our research as well as all our company news in our free monthly newsletter.