Usage¶

To use pnadc in a project:

import pnadc

Extract¶

pnadc.get( quarter, year, … , **kwargs )¶

Description:¶

Download the desired survey database and return a pandas DataFrame.

Parameters:¶

[mandatory]

quarter: int or str - desired survey quarter
input_file: int or str -desired survey year

[options]

path: str - full ending repository to download and extract data (defaults to where the script is being executed) WARNING: PATH’s MUST END WITH A CLOSING BAR
get_docs: Boolean - choose to download (True, default) or not (False) doc files in select_files. If you don’t have a input file in the given directory you should keep it at default.
select_files: list - select which doc files you wish to download/extract. Defaults to only the input file. Empty list [] means all doc files will be extract and replaced. To see which doc files are available you can use the Advanced Extract method **pnadc.extract.query*docs()**.
keep_columns: list - build the DataFrame only with the desired column list
del_file: Boolean - choose to delete (True, default) or keep (False) the origin .txt pnadc file.
sy: Boolean - saves file without loading it if True. Default is False.
**kwargs

pnadc.get_all( range_years, … , **kwargs )¶

Description:¶

Download the desired survey database year range and save them as csv.

Parameters:¶

[mandatory]

range_years: list or range - years to iterate and download all PNADc’s data

[options]

path: str - full ending repository to download and extract data (defaults to where the script is being executed) WARNING: PATH’s MUST END WITH A CLOSING BAR
get_docs: Boolean - choose to download (True, default) or not (False) doc files in select_files. If you don’t have a input file in the given directory you should keep it at default.
select_files: list - select which doc files you wish to download/extract. Defaults to only the input file. Empty list [] means all doc files will be extract and replaced. To see which doc files are available you can use the Advanced Extract method **pnadc.extract.query_docs()**.
keep_columns: list - build the DataFrame only with the desired column list
del_file: Boolean - choose to delete (True, default) or keep (False) the origin .txt pnadc file.
sy: Boolean - saves file without loading it if True. Default is False.
**kwargs

pnadc.get_all( quarter, year, … , **kwargs )¶

Description:¶

Download the desired survey database and return a pandas DataFrame.

Parameters:¶

[mandatory]

quarter: int or str - desired survey quarter
input_file: int or str -desired survey year

[options]

path: str - full ending repository to download and extract data (defaults to where the script is being executed) WARNING: PATH’s MUST END WITH A CLOSING BAR
get_docs: Boolean - choose to download (True, default) or not (False) doc files in select*files. If you don’t have a input file in the given directory you should keep it at default.
select_files: list - select which doc files you wish to download/extract. Defaults to only the input file. Empty list [] means all doc files will be extract and replaced. To see which doc files are available you can use the Advanced Extract method **pnadc.extract.query*docs()**.
keep_columns: list - build the DataFrame only with the desired column list
del_file: Boolean - choose to delete (True, default) or keep (False) the origin .txt pnadc file.
sy: Boolean - False
**kwargs

Build, Save, Unzip and Query¶

pnadc.build( data_file, input_file=’input_PNADC_trimestral.txt’ )¶

Description:¶

Return the given PNADC_0XXXXX.txt file into a pandas dataframe.

Parameters:¶

[mandatory]

data_file: str- the pnadc .txt file to be loaded
input_file: str -the .txt dictionary file. Defaults to ‘input_PNADC_trimestral.txt’, expecting it in the same directory

[options]

keep_columns: list - build the DataFrame only with the desired column list
del_file: Boolean - choose to delete (True, default) or keep (False) the origin .txt pnadc file.

pnadc.save( df, name )¶

Description:¶

Enhancements needed.

Only saves the current DataFrame with it’s .to_csv method

Parameters:¶

[mandatory]

df : pd.DataFrame - the pandas DataFrame object to be saved.
name: str- name or path+name of the file to be saved without the extension.

pnadc.unzip( file_name, … )¶

Description:¶

Unpack the given zipped file in its given directory.

Parameters:¶

[mandatory]

df : pd.DataFrame - the pandas DataFrame object to be saved.

[options]

keep_zipfile: Boolean - delete the origin zipfile if False. Default is True.

pnadc.query(q, input_file=’input_PNADC_trimestral.txt’ )¶

Description:¶

Returns a python dictionary containing the survey description about a desired variable.

Example:¶

# Supossing input file in the same directory
In [1]: import pnadc as pdc
In [2]: pdc.query("V1028")
Out[2]: {'column': 'V1028', 'desc': 'Peso COM pós estratificação'}

Parameters:¶

[mandatory]

q : str - query variable
input_file: str - the .txt dicionary file. Defaults to ‘input_PNADC_trimestral.txt’ , expecting it in the same directory

Tools¶

pnadc.tools.identify( df, … )¶

Description:¶

Identify houses (longitudinal) and/or individuals (not longitudinal) by creating respectively df[‘keyDom’] and/or df[‘keyInd’] keys and returnning them with the DataFrame.

Parameters:¶

[mandatory]

df : pd.DataFrame - the PNADC pandas dataframe to be loaded

[options]

key: str or NoneType - the desired key levels to be created
- args: ‘dom’ (houses), ‘ind’ (individuals) or None ( both, default)
UPA, V1008, V1014, V2003: variables used to create the keys. They default to same name strings.

pnadc.tools.deflators( df, defl_file )¶

Description:¶

Merge and return the current pandas DataFrame with their respectively deflators from the doc files, creating a mergeble key df[‘uf_tri_ano’] to match the df[‘def_Habitual’] (usual deflator) and df[‘def_Efetivo’] (effective deflator).

Assumes that you df contains UF, Ano and Trimestre columns.

Parameters:¶

[mandatory]

df : pd.DataFrame - the PNADC pandas DataFrame to be loaded
defl_file: str - the excel file with the deflators provided in the official docs

Advanced Extract¶

[building]