Hi. Just to mention that when one is watching the playlist for Module 5, at the end of Lesson 3 it jumps directly to Lesson 5 (skipping Lesson 4). Otherwise, it's great material, keep up the good work.
%%time c=0 for i in range(1001,5001): t=pd.read_excel(f"WRA\WRA_{i}.xlsx") c+=t[(t.age>=15) & (t.age<=19)].shape[0] print(f"adolescent_count: {c}") #result adolescent_count: 57185 #time CPU times: total: 1min 2s Wall time: 1min 6s I tried your exact implementation but in python. Wanted to try and see which would be slower, knowing that python is quite slow. Thanks for this sir, I have learnt a lot about stata from your channel.
I got my download details in about an hour. So glad they responded in no time being that it's a Friday. Many thanks for this info. and looking forward to kick-starting my lessons.
I would appreciate that very much! If each answer to the multiple choice question appear as one column, how do one sort out the information? Considering one person can give multiple answers. Use multiple medications for instance...idk of I explained my confusion well or not.
My professor passed a license he got from the university to me. He gave me user name and password word and when I try to download there is a serial number field . I do not have the serial number as part of the credential give to me.. Pls advise
Hello, Thank you so much. I am done with this beginner's course, It has been an interesting and inspiring journey. Please how will i get the evaluation link.
Good Morning Sir, I'm sharing my output with the links below: dofile (16 lines): drive.google.com/open?id=1PuUzXZmPsOJNyd_5qfBdbtY0La7fcfDG&usp=drive_fs Excel file: docs.google.com/spreadsheets/d/1R1OmBl9E3ehthtjEp1XOka8vCkiqclMrm_VQGNEK98M/edit?usp=sharing My Experience: Spent about 30mins writing the lines. Exporting the Excel file took a while (might be a fault from my laptop though). Many thanks for the opportunity, Sir.
We are happy to serve. Many thanks for your feedback . Most of the datasets used can obtained using the *sysuse* command - You may gain access to the datasets via this Dropbox link. www.dropbox.com/scl/fo/71jd6ocstpactkbk9310b/h?rlkey=gvqbupez8nhwaw6k3jzx96nq7&dl=0
Apologies for delay in response. Use the *_sysuse_* command to get the *auto.dta* Check the description for the link (www.dropbox.com/scl/fo/71jd6ocstpactkbk9310b/h?rlkey=gvqbupez8nhwaw6k3jzx96nq7&dl=0 )
Using Stata: clear **importing the excel document import excel "C:\Users\DELL\Downloads\Club_teaser.xlsx", sheet("Sheet1") firstrow **formatting the date variables generate date_var = ustrregexra(month,"(.)","$1,") split date_var , generate(Var) parse(",") gen new_month=Var1+Var2+Var3 tostring day year, replace gen date= day + new_month+ year gen new_date= date(date, "DMY") format new_date %td destring year, replace **sorting the data by date sort new_date bysort year: gen bisi=_n **exporting the registration code of the eligible members export excel month day year registeration_code using "C:\Olabisi\eligible_members.xlsx" if bisi==1, sheet("eligible_members") firstrow(variables) sheetreplace datestring("%tc")
import pandas as pd pd.get_option("display.max_rows",None) df=pd.read_excel("/content/drive/MyDrive/Club_teaser.xlsx") df = df.astype(str) df["reg_date"]=pd.to_datetime(df["Year of Registration"]+df["Month of Registration"]+df["Day of Registration"],format='%Y%B%d') selected=df.groupby(["Year of Registration"]).first() print(f"{selected['Registration ID'].count()} members selected") #export to excel selected.to_excel("/content/drive/MyDrive/club_award_winners.xlsx")
This is one of the first topics you educated me about. It really went a long way, even though I was more of a software developer. It helped me understand database cardinality and constructs of entity relationship diagrams in data modelling
2) Using Python import pandas as pd # Load Dataset - line 1 df = pd.read_excel(r"C:\Users\Henry\Downloads\Club_teaser.xlsx") # Convert variables to string - line 2 df = df.astype(str) # Genearate registration date line 3 df['RegistrationDate'] = pd.to_datetime(df['Day of Registration'].str.split('.').str[0] + ' ' + df['Month of Registration'] + ' ' + df['Year of Registration'].str.split('.').str[0], format='%d %B %Y') # Sort by registration date line 4 df = df.sort_values('RegistrationDate').reset_index() # Select the first 100 registrations - line 5 df.head(100)
@@datawithstata # Export the selected registrations to an Excel file first_100_registrations.to_excel(r'C:\Users\Henry\OneDrive\Documents\DATA SCIENCE TUTORIAL\STATA\Club Teaser\selected_registrations.xlsx', index=False)
Please Ignore this first i wasn't thinking...I jumped into your trap sir //Load Dataset import excel "C:\Users\Henry\OneDrive\Documents\DATA SCIENCE TUTORIAL\STATA\Club Teaser\Club_teaser.xlsx", sheet("Sheet1") firstrow clear //Convert variables to string tostring *, replace //Genearate registration date gen RegistrationDate = date(DayofRegistration + " " + MonthofRegistration + " " + YearofRegistration, "DMY") format RegistrationDate %td //Sort by registration date sort RegistrationDate //Select the first 60 registrations bysort Year: keep if _n == 1 keep in 1/60 //E xport the first 60 registration to excel export excel RegistrationID using "C:\Users\Henry\OneDrive\Documents\DATA SCIENCE TUTORIAL\STATA\Club Teaser\Selected RegistrationID_stata3.xlsx", sheetreplace firstrow(variables)