Data Talks
Data Talks
  • 225
  • 1 326 400
Using GPT to learn Data Science
In this video I show how I use GPT to help with data science by going through one of my viewer's notebooks
The notebook I'm going through: github.com/skeem1/From-Cafffeine-to-Cocaine/blob/main/coffee-bean-production.csv
seaborn tutorials: ua-cam.com/video/fWuPIGVPo7o/v-deo.html&pp=gAQBiAQB
Causal inference videos: ua-cam.com/video/kE-agokfsHE/v-deo.html&pp=gAQBiAQB
Переглядів: 620

Відео

Future Topics
Переглядів 313Рік тому
Thanks so much for watching! Please comment below on what topics you'd like to see covered next!
Send Me Your Notebooks!
Переглядів 400Рік тому
Comment down below with a github link to notebooks you'd like me to review!
Real Life IV Examples
Переглядів 137Рік тому
We talk about three times when IV analysis was used in real studies
IV Example
Переглядів 154Рік тому
We go over an example of IV analysis
IV Proof
Переглядів 150Рік тому
We prove why IV analysis works
4 Types Of Patients
Переглядів 165Рік тому
We talk about 4 actions patients can take when given a treatment. This servers as some framework for IV analysis.
Instrumental Variables
Переглядів 118Рік тому
We introduce Instrumental Variables
As If Random
Переглядів 188Рік тому
We cover an important topic for IV analysis - or the last topic of the course
Other Causal Inference Tools
Переглядів 191Рік тому
I talk about the most common tools in Causal Inf that I won't be covering: * IP Weighting * Outcome Regresssion * Propensity Scores * G-estimation
Model Misspecification
Переглядів 325Рік тому
We talk about an added assumption of the parametric G formula
Parametric G Formula
Переглядів 1,6 тис.Рік тому
We describe my favorite causal inference technique: the parametric G formula, my go-to for any standard observational causal inference problems
Modeling Means
Переглядів 170Рік тому
We do a quick primer of Linear Regression (a ML technique) to prepare us for our next ML base causal inference tool!
2 Problems With Standardization
Переглядів 267Рік тому
Today we talk about two problems with standardization: combinatorial explosions and continuous variables. This sets us up nices for the next lecture where we talk about the solution to these problems.
Exchangeability In Observational Studies
Переглядів 180Рік тому
Exchangeability in observational studies is possible without confounding or selection bias
Exchangeability Review
Переглядів 207Рік тому
Exchangeability Review
Openai Codex Writes Simple Python - With Some Help ;)
Переглядів 1,4 тис.2 роки тому
Openai Codex Writes Simple Python - With Some Help ;)
Is Openai Codex Smarter Than A Data Scientist?!?
Переглядів 2,3 тис.2 роки тому
Is Openai Codex Smarter Than A Data Scientist?!?
Selection Bias Example 4 - Full Adherence
Переглядів 3593 роки тому
Selection Bias Example 4 - Full Adherence
Selection Bias Example 3
Переглядів 2543 роки тому
Selection Bias Example 3
Selection Bias Example 2
Переглядів 3053 роки тому
Selection Bias Example 2
Selection Bias Example 1
Переглядів 8063 роки тому
Selection Bias Example 1
Standardization With Censorship - Causal Inference
Переглядів 3693 роки тому
Standardization With Censorship - Causal Inference
Selection Bias Graphically - Causal Inference
Переглядів 1,3 тис.3 роки тому
Selection Bias Graphically - Causal Inference
Confounding Example 3 - Causal Inference
Переглядів 3103 роки тому
Confounding Example 3 - Causal Inference
Confounding Example 2 - Causal Inference
Переглядів 3923 роки тому
Confounding Example 2 - Causal Inference
Confounding Example 1 - Causal Inference
Переглядів 7503 роки тому
Confounding Example 1 - Causal Inference
Request For Data!!
Переглядів 8493 роки тому
Request For Data!!
Confounding Graphically - Causal Inference
Переглядів 3023 роки тому
Confounding Graphically - Causal Inference
Confounding Examples - Causal Inference
Переглядів 6773 роки тому
Confounding Examples - Causal Inference

КОМЕНТАРІ

  • @MarkkuPilarinen
    @MarkkuPilarinen 6 днів тому

    Love your enthusiasm and excitement

  • @muratalparoglu1422
    @muratalparoglu1422 14 днів тому

    for step 4 my solution was: filtered_chipo = chipo[chipo['item_price'] > 10] filtered_chipo.item_name.nunique() which also gives 31. Its always good to see different solutions.

  • @Random_Legends_Shorts
    @Random_Legends_Shorts 14 днів тому

    did anyone get step 5 the exact way in the series like in the exercise?

  • @zsyftw
    @zsyftw 23 дні тому

    nice

  • @ski34able
    @ski34able 2 місяці тому

    Thank you, finally understand like 80% of the iid assumption, thanks!

  • @bubblebath2892
    @bubblebath2892 2 місяці тому

    Nice tutorial

  • @bubblebath2892
    @bubblebath2892 2 місяці тому

    how can i actually see the median values-show them in the plot ....so thati can actually compare them, for example in titanic dataset i want to know if median age of survivors in males is higher than in non-survivors or the median age of survivors in females is higher than in non-survivors.

  • @virajHostels-mv8wv
    @virajHostels-mv8wv 2 місяці тому

    Great

  • @imzwaza6784
    @imzwaza6784 2 місяці тому

    thank you sir

  • @khadijaelbeshti5123
    @khadijaelbeshti5123 2 місяці тому

    Thank you! this was useful and quick!

  • @gautamkulkarni7049
    @gautamkulkarni7049 4 місяці тому

    I could understand the whole thing very easily. Thanks :)

  • @automatewithamit
    @automatewithamit 4 місяці тому

    Thanks for your Video ! Can you please let me know if we can put those collapse and expand functionality in actually generated pivot table in excel using python ??

  • @user-ll3jg1by5w
    @user-ll3jg1by5w 4 місяці тому

    i got invalidargumenterror graph execution error in model.fit() using cnn model please tell me how to resolve @Data Talks

  • @alice9737
    @alice9737 6 місяців тому

    So if we use train_test split do we also need to use cross-validation?

    • @DataTalks
      @DataTalks 6 місяців тому

      Great question! You will always need to have a test set - that's what's gonna tell you how well your model will do in production. Cross validation is a way to have a validation set with a lower amount of data. Where your validation set is what you use to optimize hyper parameters.

    • @alice9737
      @alice9737 6 місяців тому

      @@DataTalks thank you for clearing that up for me!

  • @VritanshKamal
    @VritanshKamal 6 місяців тому

    Precisely explained. Loved the explanation !

  • @itsreallysimple1
    @itsreallysimple1 7 місяців тому

    they said: IQR the data! me: Long tails? they said: Winsorize! me: Still long tails!? they said: RobustScaler! QuantileTransformer! SciKit the data! me: Throw the data away? they: ...You're new here, aren't you? me: Yes 😭 YOU: Boxenplot to save the data! me: THANK YOU

  • @karrikarthik6936
    @karrikarthik6936 7 місяців тому

    please share how to increase fontsize of legend and that of the feature names

  • @garciarogerio6327
    @garciarogerio6327 8 місяців тому

    Good tutorial. However, the revenue is incorrectly calculated as the price should be multiplied by the quantity. Also, the number of orders should be revised as there are many items in the same order, so the number of rows in the dataset is higher than the number of orders

  • @authentic_101
    @authentic_101 10 місяців тому

    Hello Nathaniel, is this playlist relevant in 2023? Please advise. Thanks.

    • @DataTalks
      @DataTalks 10 місяців тому

      A little bit of yes and no. It's always good to know the library! But the basics of data science of changed. I can't tell you how much I'm using Chatgpt and LLM to do data science these days!

    • @authentic_101
      @authentic_101 10 місяців тому

      @@DataTalks ​thanks for the quick & honest response. I understand that ChatGPT has become everyone's personal assistant. But how do LLMs play a role, apart from being used in NLP projects? You mean, to summarize articles / documents / books etc?

  • @adamchabaane3011
    @adamchabaane3011 10 місяців тому

    Awesome video man, if only more people saw this!

  • @moscovita4
    @moscovita4 10 місяців тому

    I can't find the url :( raw.githubusercontent.com/guipsamora/pandas_exercises/master/06_Stats/US_Baby_Names_right.csv

  • @rishabhdutt3233
    @rishabhdutt3233 10 місяців тому

    i though that cap is your signature as it was there in last 2 videos 😂

  • @kjg2565
    @kjg2565 10 місяців тому

    great series, concise but definitely going in-depth enough for intermediates. did you ever do a tutorial for timeseries in the end? would love to see that

    • @DataTalks
      @DataTalks 10 місяців тому

      No timeseries yet! I've been spending more of my time poking into newer transformer based stuff, but an overview of pandas timeseries goodies is definitely needed!

  • @user-ps8re3ot2u
    @user-ps8re3ot2u 11 місяців тому

    thank you for posting good content from a data science undergrad

  • @PavanSibal
    @PavanSibal 11 місяців тому

    Nice tutorial. Do you have any example of React client calling parametrized publication and subscription? Thanks

  • @Mrchungcc1
    @Mrchungcc1 11 місяців тому

    Thank you for sharing this video. I am still not clear about your dataset. class1_points and class2_points have a shape of (5000, 10, 30). That means you created a dataset of 5000 people, 10 credit cards, and 30 features of credit cards?

  • @Mixmers
    @Mixmers 11 місяців тому

    your screen is not clear to viewers

  • @Lion-wm6mf
    @Lion-wm6mf Рік тому

    For step 16, you're simply doing overall average price, for ALL order. The questions is PER order, so I believe it should be: chipo.groupby('order_id').agg({'item_price': 'mean'})

  • @josephjoy7080
    @josephjoy7080 Рік тому

    i want to learn machine learning, but before that I am preparing i finished learning python 40 h slow course by abdul from udemy, i watched from youtube 1 h course numpy 1 h course pandas 1 h course matplotlib and this pandas series/ playlist i wonder if i am ready to start the machine learning course from Coursea by Andrew Ng, or should i leant Linear Algebra first

    • @DataTalks
      @DataTalks Рік тому

      Linear algebra is great, but not a prereq for all ML. I'd recommend hopping into the Andrew Ng course!

  • @V_Wankhede
    @V_Wankhede Рік тому

    This video quite usefull.. but can you please explain point 11 12 and point 4 again.

  • @son3305
    @son3305 Рік тому

    amazing explanation! Thank you

  • @kennethstephani692
    @kennethstephani692 Рік тому

    Very informative!

  • @drewg4323
    @drewg4323 Рік тому

    Great video

  • @chigstardan7285
    @chigstardan7285 Рік тому

    Love it. Simple and to the point.

  • @user-ig6sg1in2z
    @user-ig6sg1in2z Рік тому

    no this channel is so underrated i love this channel

  • @esspi9
    @esspi9 Рік тому

    Hey! Any plans on covering pandas2.0 problems/exercises?

    • @DataTalks
      @DataTalks 10 місяців тому

      Send me a link and I'll go through them! I've got a bit of time in the next couple of months!

  • @vishalyadav-mr6tx
    @vishalyadav-mr6tx Рік тому

    From where I can get the dataset of all the case studies that you have discussed in all the visualization videos

    • @DataTalks
      @DataTalks Рік тому

      The data is all in the exercises notebook github.com/guipsamora/pandas_exercises#getting-and-knowing

  • @nano7586
    @nano7586 Рік тому

    The title is a bit confusing. I was searching for a video on how to determine the confidence interval of the 10th-percentile. But this was about determining the confidence interval of the mean.

    • @DataTalks
      @DataTalks Рік тому

      There are different ways of computing confidence intervals (one using STD and assuming normalcy for example), this one is specifically called the percentile confidence interval. You can in fact use this method on any statistic of the data (not just mean). So you can use the to compute the confidence interval of the 10th percentile. With the above video instead of the function "h" being average, just sub in 10th-percentile

  • @maximilyen
    @maximilyen Рік тому

    Zeeo to hero ,I liked that😂

  • @maximilyen
    @maximilyen Рік тому

    Very nice

  • @maximilyen
    @maximilyen Рік тому

    Nice

  • @StefanoVerugi
    @StefanoVerugi Рік тому

    I think the resample method might give you unexpected results, to work with the actual dataset index (keeping the same info of the resulting record) I found the .tail(1) method more accurate. The dataset in the video was already filtered listing working days only, to find the last of the dates in the index by month: df.groupby([df.index.year, df.index.month]).tail(1)

  • @sketchytv1321
    @sketchytv1321 Рік тому

    subscribed

  • @blazemates
    @blazemates Рік тому

    The concept you explain is really good and feels much easy.but after seeing that better bootstrap ci pdf my mind totally changed 😢😅

  • @devoneybrandon5784
    @devoneybrandon5784 Рік тому

    Thanks for this! How are you showing the info on the variable? At 4:20.

  • @kandagadlaashokkumar663
    @kandagadlaashokkumar663 Рік тому

    9:41 Which keys should we need to press to get the associated functions of that series?

  • @alananalyst7795
    @alananalyst7795 Рік тому

    hey, new fan here!!

  • @Izzy-ve3xz
    @Izzy-ve3xz Рік тому

    lol this video could have been 1 sentence long based on the method you showed: Padding is how to Deal with Variable Length Features and Deep Learning

  • @FIBONACCIVEGA
    @FIBONACCIVEGA Рік тому

    I really love the way to explain . Just I would like to ask if you have any vide with interpretation of the plots . For example boxplot interpretation quit answers of western about this data. Thank for all.

    • @DataTalks
      @DataTalks Рік тому

      Feel free to send over an exercise and I'd be happy to walk through it in a vid!

    • @FIBONACCIVEGA
      @FIBONACCIVEGA Рік тому

      @@DataTalks I shared the link on UA-cam( here), but the platform deleted my comment with the info. Do you have other way to send a file ? thanks in advance

    • @FIBONACCIVEGA
      @FIBONACCIVEGA Рік тому

      @@DataTalks I sent t you a email with the info . Thanks in advance

  • @alexrand1350
    @alexrand1350 Рік тому

    Very helpful, thanks for sharing! I'm wondering -- would your explanation for why standardization matters even when there's no effect modification still apply in a situation with a binary outcome variable? Since in that case the absolute values of the outcomes are constrained in the same way within each group of L?