data cleaning and preprocessing in r