![Data Talks](/img/default-banner.jpg)
- 225
- 1 326 400
Data Talks
Приєднався 20 чер 2013
The channel goal is to teach people how to make data driven decisions, aka good decisions.
I started by doing videos explaining tools that will help make your decisions including: seaborn for data visualization, patsy for feature engineering, stats models for causal inference (I might build better tools for this in the future), sklearn for machine learning and keras for deep learning.
I'm currently on stage two where I explain the basic concepts (you only need middle school algebra to follow along) involved in data decision making including: statistical inference for talking about data you have never seen, machine learning for teaching computers how to find patterns in data for you and causal inference for learning what causes what.
Stage three is a combination of practical lessons (data driven tips for making decisions) and much later a framework for making decisions.
Throughout I will sprinkle conversations with my friends and other data scientists.
I started by doing videos explaining tools that will help make your decisions including: seaborn for data visualization, patsy for feature engineering, stats models for causal inference (I might build better tools for this in the future), sklearn for machine learning and keras for deep learning.
I'm currently on stage two where I explain the basic concepts (you only need middle school algebra to follow along) involved in data decision making including: statistical inference for talking about data you have never seen, machine learning for teaching computers how to find patterns in data for you and causal inference for learning what causes what.
Stage three is a combination of practical lessons (data driven tips for making decisions) and much later a framework for making decisions.
Throughout I will sprinkle conversations with my friends and other data scientists.
Using GPT to learn Data Science
In this video I show how I use GPT to help with data science by going through one of my viewer's notebooks
The notebook I'm going through: github.com/skeem1/From-Cafffeine-to-Cocaine/blob/main/coffee-bean-production.csv
seaborn tutorials: ua-cam.com/video/fWuPIGVPo7o/v-deo.html&pp=gAQBiAQB
Causal inference videos: ua-cam.com/video/kE-agokfsHE/v-deo.html&pp=gAQBiAQB
The notebook I'm going through: github.com/skeem1/From-Cafffeine-to-Cocaine/blob/main/coffee-bean-production.csv
seaborn tutorials: ua-cam.com/video/fWuPIGVPo7o/v-deo.html&pp=gAQBiAQB
Causal inference videos: ua-cam.com/video/kE-agokfsHE/v-deo.html&pp=gAQBiAQB
Переглядів: 620
Відео
Future Topics
Переглядів 313Рік тому
Thanks so much for watching! Please comment below on what topics you'd like to see covered next!
Send Me Your Notebooks!
Переглядів 400Рік тому
Comment down below with a github link to notebooks you'd like me to review!
Real Life IV Examples
Переглядів 137Рік тому
We talk about three times when IV analysis was used in real studies
4 Types Of Patients
Переглядів 165Рік тому
We talk about 4 actions patients can take when given a treatment. This servers as some framework for IV analysis.
As If Random
Переглядів 188Рік тому
We cover an important topic for IV analysis - or the last topic of the course
Other Causal Inference Tools
Переглядів 191Рік тому
I talk about the most common tools in Causal Inf that I won't be covering: * IP Weighting * Outcome Regresssion * Propensity Scores * G-estimation
Model Misspecification
Переглядів 325Рік тому
We talk about an added assumption of the parametric G formula
Parametric G Formula
Переглядів 1,6 тис.Рік тому
We describe my favorite causal inference technique: the parametric G formula, my go-to for any standard observational causal inference problems
Modeling Means
Переглядів 170Рік тому
We do a quick primer of Linear Regression (a ML technique) to prepare us for our next ML base causal inference tool!
2 Problems With Standardization
Переглядів 267Рік тому
Today we talk about two problems with standardization: combinatorial explosions and continuous variables. This sets us up nices for the next lecture where we talk about the solution to these problems.
Exchangeability In Observational Studies
Переглядів 180Рік тому
Exchangeability in observational studies is possible without confounding or selection bias
Openai Codex Writes Simple Python - With Some Help ;)
Переглядів 1,4 тис.2 роки тому
Openai Codex Writes Simple Python - With Some Help ;)
Is Openai Codex Smarter Than A Data Scientist?!?
Переглядів 2,3 тис.2 роки тому
Is Openai Codex Smarter Than A Data Scientist?!?
Selection Bias Example 4 - Full Adherence
Переглядів 3593 роки тому
Selection Bias Example 4 - Full Adherence
Standardization With Censorship - Causal Inference
Переглядів 3693 роки тому
Standardization With Censorship - Causal Inference
Selection Bias Graphically - Causal Inference
Переглядів 1,3 тис.3 роки тому
Selection Bias Graphically - Causal Inference
Confounding Example 3 - Causal Inference
Переглядів 3103 роки тому
Confounding Example 3 - Causal Inference
Confounding Example 2 - Causal Inference
Переглядів 3923 роки тому
Confounding Example 2 - Causal Inference
Confounding Example 1 - Causal Inference
Переглядів 7503 роки тому
Confounding Example 1 - Causal Inference
Confounding Graphically - Causal Inference
Переглядів 3023 роки тому
Confounding Graphically - Causal Inference
Confounding Examples - Causal Inference
Переглядів 6773 роки тому
Confounding Examples - Causal Inference
Love your enthusiasm and excitement
for step 4 my solution was: filtered_chipo = chipo[chipo['item_price'] > 10] filtered_chipo.item_name.nunique() which also gives 31. Its always good to see different solutions.
did anyone get step 5 the exact way in the series like in the exercise?
nice
Thank you, finally understand like 80% of the iid assumption, thanks!
Nice tutorial
how can i actually see the median values-show them in the plot ....so thati can actually compare them, for example in titanic dataset i want to know if median age of survivors in males is higher than in non-survivors or the median age of survivors in females is higher than in non-survivors.
Great
thank you sir
Thank you! this was useful and quick!
I could understand the whole thing very easily. Thanks :)
Thanks for your Video ! Can you please let me know if we can put those collapse and expand functionality in actually generated pivot table in excel using python ??
i got invalidargumenterror graph execution error in model.fit() using cnn model please tell me how to resolve @Data Talks
So if we use train_test split do we also need to use cross-validation?
Great question! You will always need to have a test set - that's what's gonna tell you how well your model will do in production. Cross validation is a way to have a validation set with a lower amount of data. Where your validation set is what you use to optimize hyper parameters.
@@DataTalks thank you for clearing that up for me!
Precisely explained. Loved the explanation !
they said: IQR the data! me: Long tails? they said: Winsorize! me: Still long tails!? they said: RobustScaler! QuantileTransformer! SciKit the data! me: Throw the data away? they: ...You're new here, aren't you? me: Yes 😭 YOU: Boxenplot to save the data! me: THANK YOU
please share how to increase fontsize of legend and that of the feature names
Good tutorial. However, the revenue is incorrectly calculated as the price should be multiplied by the quantity. Also, the number of orders should be revised as there are many items in the same order, so the number of rows in the dataset is higher than the number of orders
Hello Nathaniel, is this playlist relevant in 2023? Please advise. Thanks.
A little bit of yes and no. It's always good to know the library! But the basics of data science of changed. I can't tell you how much I'm using Chatgpt and LLM to do data science these days!
@@DataTalks thanks for the quick & honest response. I understand that ChatGPT has become everyone's personal assistant. But how do LLMs play a role, apart from being used in NLP projects? You mean, to summarize articles / documents / books etc?
Awesome video man, if only more people saw this!
I can't find the url :( raw.githubusercontent.com/guipsamora/pandas_exercises/master/06_Stats/US_Baby_Names_right.csv
i though that cap is your signature as it was there in last 2 videos 😂
great series, concise but definitely going in-depth enough for intermediates. did you ever do a tutorial for timeseries in the end? would love to see that
No timeseries yet! I've been spending more of my time poking into newer transformer based stuff, but an overview of pandas timeseries goodies is definitely needed!
thank you for posting good content from a data science undergrad
Nice tutorial. Do you have any example of React client calling parametrized publication and subscription? Thanks
Thank you for sharing this video. I am still not clear about your dataset. class1_points and class2_points have a shape of (5000, 10, 30). That means you created a dataset of 5000 people, 10 credit cards, and 30 features of credit cards?
your screen is not clear to viewers
For step 16, you're simply doing overall average price, for ALL order. The questions is PER order, so I believe it should be: chipo.groupby('order_id').agg({'item_price': 'mean'})
i want to learn machine learning, but before that I am preparing i finished learning python 40 h slow course by abdul from udemy, i watched from youtube 1 h course numpy 1 h course pandas 1 h course matplotlib and this pandas series/ playlist i wonder if i am ready to start the machine learning course from Coursea by Andrew Ng, or should i leant Linear Algebra first
Linear algebra is great, but not a prereq for all ML. I'd recommend hopping into the Andrew Ng course!
This video quite usefull.. but can you please explain point 11 12 and point 4 again.
amazing explanation! Thank you
Very informative!
Great video
Love it. Simple and to the point.
no this channel is so underrated i love this channel
Hey! Any plans on covering pandas2.0 problems/exercises?
Send me a link and I'll go through them! I've got a bit of time in the next couple of months!
From where I can get the dataset of all the case studies that you have discussed in all the visualization videos
The data is all in the exercises notebook github.com/guipsamora/pandas_exercises#getting-and-knowing
The title is a bit confusing. I was searching for a video on how to determine the confidence interval of the 10th-percentile. But this was about determining the confidence interval of the mean.
There are different ways of computing confidence intervals (one using STD and assuming normalcy for example), this one is specifically called the percentile confidence interval. You can in fact use this method on any statistic of the data (not just mean). So you can use the to compute the confidence interval of the 10th percentile. With the above video instead of the function "h" being average, just sub in 10th-percentile
Zeeo to hero ,I liked that😂
Very nice
Nice
I think the resample method might give you unexpected results, to work with the actual dataset index (keeping the same info of the resulting record) I found the .tail(1) method more accurate. The dataset in the video was already filtered listing working days only, to find the last of the dates in the index by month: df.groupby([df.index.year, df.index.month]).tail(1)
subscribed
The concept you explain is really good and feels much easy.but after seeing that better bootstrap ci pdf my mind totally changed 😢😅
Thanks for this! How are you showing the info on the variable? At 4:20.
9:41 Which keys should we need to press to get the associated functions of that series?
hey, new fan here!!
lol this video could have been 1 sentence long based on the method you showed: Padding is how to Deal with Variable Length Features and Deep Learning
I really love the way to explain . Just I would like to ask if you have any vide with interpretation of the plots . For example boxplot interpretation quit answers of western about this data. Thank for all.
Feel free to send over an exercise and I'd be happy to walk through it in a vid!
@@DataTalks I shared the link on UA-cam( here), but the platform deleted my comment with the info. Do you have other way to send a file ? thanks in advance
@@DataTalks I sent t you a email with the info . Thanks in advance
Very helpful, thanks for sharing! I'm wondering -- would your explanation for why standardization matters even when there's no effect modification still apply in a situation with a binary outcome variable? Since in that case the absolute values of the outcomes are constrained in the same way within each group of L?