Usage¶
To use pnadc in a project:
import pnadc
Extract¶
pnadc.get( quarter, year, … , **kwargs )¶
Description:¶
Download the desired survey database and return a pandas DataFrame.
Parameters:¶
[mandatory]
- quarter: int or str - desired survey quarter
- input_file: int or str -desired survey year
[options]
- path: str - full ending repository to download and extract data (defaults to where the script is being executed) WARNING: PATH’s MUST END WITH A CLOSING BAR
- get_docs: Boolean - choose to download (True, default) or not (False) doc files in select_files. If you don’t have a input file in the given directory you should keep it at default.
- select_files: list - select which doc files you wish to download/extract. Defaults to only the input file. Empty list [] means all doc files will be extract and replaced. To see which doc files are available you can use the Advanced Extract method **pnadc.extract.query*docs()**.
- keep_columns: list - build the DataFrame only with the desired column list
- del_file: Boolean - choose to delete (True, default) or keep (False) the origin .txt pnadc file.
- sy: Boolean - saves file without loading it if True. Default is False.
- **kwargs
pnadc.get_all( range_years, … , **kwargs )¶
Description:¶
Download the desired survey database year range and save them as csv.
Parameters:¶
[mandatory]
- range_years: list or range - years to iterate and download all PNADc’s data
[options]
- path: str - full ending repository to download and extract data (defaults to where the script is being executed) WARNING: PATH’s MUST END WITH A CLOSING BAR
- get_docs: Boolean - choose to download (True, default) or not (False) doc files in select_files. If you don’t have a input file in the given directory you should keep it at default.
- select_files: list - select which doc files you wish to download/extract. Defaults to only the input file. Empty list [] means all doc files will be extract and replaced. To see which doc files are available you can use the Advanced Extract method **pnadc.extract.query_docs()**.
- keep_columns: list - build the DataFrame only with the desired column list
- del_file: Boolean - choose to delete (True, default) or keep (False) the origin .txt pnadc file.
- sy: Boolean - saves file without loading it if True. Default is False.
- **kwargs
pnadc.get_all( quarter, year, … , **kwargs )¶
Description:¶
Download the desired survey database and return a pandas DataFrame.
Parameters:¶
[mandatory]
- quarter: int or str - desired survey quarter
- input_file: int or str -desired survey year
[options]
- path: str - full ending repository to download and extract data (defaults to where the script is being executed) WARNING: PATH’s MUST END WITH A CLOSING BAR
- get_docs: Boolean - choose to download (True, default) or not (False) doc files in select*files. If you don’t have a input file in the given directory you should keep it at default.
- select_files: list - select which doc files you wish to download/extract. Defaults to only the input file. Empty list [] means all doc files will be extract and replaced. To see which doc files are available you can use the Advanced Extract method **pnadc.extract.query*docs()**.
- keep_columns: list - build the DataFrame only with the desired column list
- del_file: Boolean - choose to delete (True, default) or keep (False) the origin .txt pnadc file.
- sy: Boolean - False
- **kwargs
Build, Save, Unzip and Query¶
pnadc.build( data_file, input_file=’input_PNADC_trimestral.txt’ )¶
Description:¶
Return the given PNADC_0XXXXX.txt file into a pandas dataframe.
Parameters:¶
[mandatory]
- data_file: str- the pnadc .txt file to be loaded
- input_file: str -the .txt dictionary file. Defaults to ‘input_PNADC_trimestral.txt’, expecting it in the same directory
[options]
- keep_columns: list - build the DataFrame only with the desired column list
- del_file: Boolean - choose to delete (True, default) or keep (False) the origin .txt pnadc file.
pnadc.save( df, name )¶
Description:¶
Parameters:¶
[mandatory]
- df : pd.DataFrame - the pandas DataFrame object to be saved.
- name: str- name or path+name of the file to be saved without the extension.
pnadc.unzip( file_name, … )¶
Description:¶
Unpack the given zipped file in its given directory.
Parameters:¶
[mandatory]
- df : pd.DataFrame - the pandas DataFrame object to be saved.
[options]
- keep_zipfile: Boolean - delete the origin zipfile if False. Default is True.
pnadc.query(q, input_file=’input_PNADC_trimestral.txt’ )¶
Description:¶
Returns a python dictionary containing the survey description about a desired variable.
Example:¶
# Supossing input file in the same directory
In [1]: import pnadc as pdc
In [2]: pdc.query("V1028")
Out[2]: {'column': 'V1028', 'desc': 'Peso COM pós estratificação'}
Parameters:¶
[mandatory]
- q : str - query variable
- input_file: str - the .txt dicionary file. Defaults to ‘input_PNADC_trimestral.txt’ , expecting it in the same directory
Tools¶
pnadc.tools.identify( df, … )¶
Description:¶
Identify houses (longitudinal) and/or individuals (not longitudinal) by creating respectively df[‘keyDom’] and/or df[‘keyInd’] keys and returnning them with the DataFrame.
Parameters:¶
[mandatory]
- df : pd.DataFrame - the PNADC pandas dataframe to be loaded
[options]
- key: str or NoneType - the desired key levels to be created
- args: ‘dom’ (houses), ‘ind’ (individuals) or None ( both, default)
- UPA, V1008, V1014, V2003: variables used to create the keys. They default to same name strings.
pnadc.tools.deflators( df, defl_file )¶
Description:¶
Parameters:¶
[mandatory]
- df : pd.DataFrame - the PNADC pandas DataFrame to be loaded
- defl_file: str - the excel file with the deflators provided in the official docs
Advanced Extract¶
[building]