Prepared by: Student’s Name
Date: October 3, 2020
Walden University, CBE Data Science Essentials, Using Data Science Frameworks to Solve Business Problems, Apply a data science framework to a business problem
Page 2 of 5
Part 1: Establishing the Business Understanding
The First Step of the Methodology
Some people want or do skip the methodology and go straight to solutions. When doing this, it stops a person best intention of solving the problem. The first step is crucial because it’s purpose is to share a methodology that can be put to work in data science to guarantee that the data being used is used correctly such as being manipulated (IBM, 2016).
Defining the Business Understanding and Analytical Approach
Nutri Mondo used an analytic approach after defining the business. They came up with the definition of creating a business to promote healthier eating and teaming up with other organizations in different areas. The analytic approach used was diagnostic (statistical analysis). Diagnostic analytic approach asks what happened and why is it happening (IBM, 2016). Nutri Mondo wanted to find out the statistics about people and their eating habits in different areas. The information they wanted to gather was about specific health problems connected to how easy is it to access fresh food in their area.
The Importance of the Process
This first step in the process is important for data scientist to begin their projects for a couple of reasons. The first reason is being It helps define and gather the information they need. After coming up with the definition and purpose of the business, they then need to find what analytic approach they will use. The analytic approach will help them gather the information they need to finish defining the business. The second reason is simply having the correct answer to the outcome (IBM, 2016).
Part 2: Data Requirements and Data Collection
Data Requirements and Data Collection Processes
The data requirement for Nutri Mondo is to be discussed next after talking about business understanding and analytics. In the data requirement step, the Nutri Mondo team will make a decision on what kind of data they will need to collect, how they will collect the data, how to understand the data, and how to set the data up to accomplish the desired outcome. We have already come up with the plan of using public, free government data. The Nutri Mondo team decided on this because it gives them an opportunity to compare data to get an idea of what is happening. The next step in this process is data collection. In this step, the team can make a decision to edit the amount of data and change the data requirements. The team decided upon collecting their own data and have their regional team members collect data from the region they are located in. Eventually, they decided not to have the regional teams collect their own data, because it may be a waste of information. They decided to stick with public data and analyze it. This will make it easier for Nutri Mondo to compare data by cutting down the time and focusing on certain variables. Being that this information is already out to the public, it can give Nutri Mondo a view of what is happening and could build a solid relationship with local governments.
Part 3: Data Sets
Constructing Data Sets
The data science team used multiple tools and visuals to construct their data set. First, they used existing data sets to look at specific relationships between their identified variables. The data science team also made use of IBM’s Watson Analytic which is a tool designed to input data and produce different visuals. With visual aids, the Nutri Mondo team looked at the overall volume of the data and realized that there was a lot of information (data cleaning). The data team also needed to determine what to do with NaN values, or those that were not recorded. Positive and negative correlations were also used, as well as standard deviation, to lay out and show which relationships the data science team wanted to explore further.
The team used the IBM tool to look further at the specific variables that they wanted to explore. The first pattern was food insecurities, and this was represented by a bar graph. There was also a clear pattern, shown with a scatter plot, showing a positive correlation (both increase) between obesity and those receiving SNAP assistance. The team was also about to notice an inverse relationship between obesity and the Asian population. A scatter plot was also shown that identified a positive correlation between the white population and food stamp assistance, while an inverse relationship between the Hispanic populations was recognized with the same variables. However, these patterns hold no real significance yet, but only serve to help us understand and familiarize ourselves with the data.
Preparing the Data
Nutri Mondo started off looking at their main questions; What is the relationship between diet problems in certain areas and how easy is it to access fresh food there. The team have been on video calls to make sure everyone has the same mindset in the project. Nutri Mondo used a visual model that showed the relationship between federal food assistant programs (SNAP) that is supposed to help lower income families and the number of stores that allows people to use the Snap benefits. They used different colors for each state. They then looked at free lunch programs in schools and tied it in to childhood obesity. For this, they used a scatter plot.
Part 4: Data Modeling
The Purpose of Data Modeling
Data modeling focus on creating models that are descriptive analytics and predictive analytics. An example of descriptive is saying if a person did this, then they would do that. Predictive is a yes or a no. Predictive modeling uses training sets. Training sets are historical data where the outcome is already there. It also let be known when the model needs to be adjusted. Data modeling accomplishes answering the question in a solid way (IBM, 2016).
When Modeling Occurs
Modeling comes after data scientist understand and prepare their data because they first have to look at the training set. Training set is a chart of historical data where the outcome is already known. The training set let the data scientist know if the model needs to be adjusted. After looking at the training set and acting accordingly, this will allow the data scientist to have a model where the question is answered (IBM, 2016).
Modeling and Business Understanding
Data scientist ensure that that modeling responds to the original business understanding through evaluation. Evaluation allows the grade of the model to be evaluated. It is also an opportunity to see of the model meets the opening request. It shows whether the model really answer the question (IBM, 2016).
Processes for Evaluation
The process that the Nutri Mondo team is using is the diagnostic measure. It is used to see if the model is working as it should. In the diagnostic measure, there is either a predictive model or descriptive model. Being that relationships are being looked at; Nutri Mondo is using the descriptive model. The relationships that are looked at are specific health problems and access to fresh food. In this, a testing set with outcomes that are already known can be used and the model can be adjusted (IBM, 2016).
Part 5: Deploying Data and Working With Feedback
Deploying What Is Found in Data
The Nutri Mondo team could have released the model to limited number of directors at the headquarters in Miami. With this, the feedback process would be easier and faster. The feed back from directors would have to be high-level. The second option would be to limit the release of models to directors working for Nutri Mondo. This option can allow high level feedback from directors who are pretty comfortable with data. The insight could be valuable to the company by discovering how general the data can be. The third option is limiting release to selected directors of regional offices in the United States and other parts of the world. This option allows the data to reach closer to the intentional user of the model. The last option was releasing the model to everyone in the organization at the same time. Doing this, you would be able to get feedback from all different levels.
How You Would Deploy the Data
If the decision were mine to decide, I would deploy the data by option D. Option D was to deploy the model to the entire organization at once. I feel this way because once everybody gets the model, we will be able to get feedback. The con stated how it would be difficult to manage, but if it is done in a time frame such as giving the entire organization a specific time to respond with feedback, I feel as if it could be better at managing. Some people will not be able to send in feedback because of other things they have going, so you will not necessarily have feedback from everyone. Also, I feel as if every one in the organization should know exactly what is going on in different areas regardless of area you are in.
The first two emails started off with things they liked about the model such as seeing the national trends tie into local level data, having public government data organized in such a way. All of the emails stated things they have going on in their areas like Texas having the data provided relating to their projects(Programs and health problems), things Georgia outreach teams are going in relation to the data provided, Mexico needing similar projects in their area, and Brazil comparing their data to the model provided. The email from Texas feedback at the end basically said how they would like some partnership between the two regions. At the end of Georgia email, they said that some parts of the model need to be enlarged, clarification of some percentages provided, and suggested that they visit each other to help each other out. Mexico suggest better data sets and gathering information from the regional offices in the United States and see how they would use the data. Brazil talked about hoe they would like more data such as seeing what is happening in the US with the classes provided to the communities and how migration changes diets. This is giving the team insight to refine their model because other areas are seeing things such as, they see a correlation with in their area which could help the outcome even more.
IBM. (2016a). Analytic approach [Video file]. Armonk, NY: Author.
IBM. (2016k). Evaluation [Video file]. Armonk, NY: Author.