Using regex and a conditional to substitute values in a column in Pandas

My code generated an output in Jupyter notebook containing an array of object data types including ‘asus’, ‘ASUS ZEN’, ‘Acer’, and ‘ACER Swift’. For this, I used an example from the Pandas documentation which can be found in the provided link. I would greatly appreciate any assistance and explanation for the following question: How can I replace all values in a specific column containing the word ‘acer’ (case-insensitive) with ‘ACER’, and all values containing the string ‘asus *’ with ‘ASUS’? Solution 3 involves an alternative method and an extension by piRSquared.


Question:

I’m attempting to use regex to substitute specific values in a
pandas column
dataframe. However, I aim to implement the regex according to the values in a distinct column.

A basic example;

index  col1  col2
1      yes   foobar
2      yes   foo
3      no    foobar

Using the following;

df.loc[df['col1'] == 'yes', 'col2'].replace({r'(fo)o(?!bar)' :r'1'}, inplace=True, regex=True)

I expected the following result;

index  col1  col2
1      yes   foobar
2      yes   fo
3      no    foobar

Despite its lack of errors or a

settingwithcopy

warning, the method in question appears to be ineffective. Is there another approach that can be taken?


Solution 1:

To prevent chained assignments, assign the value back and then remove

inplace=True

.

mask = df['col1'] == 'yes'
df.loc[mask, 'col2'] = df.loc[mask, 'col2'].replace({r'(fo)o(?!bar)' :r'1'}, regex=True)
print (df)
  col1    col2
1  yes  foobar
2  yes      fo
3   no  foobar


Solution 2:

Using

np.where

:

df.assign(
    col2=np.where(df.col1.eq('yes'), df.col2.str.replace(r'(fo)o(?!bar)', r'1'), df.col2)
)
  col1    col2
1  yes  foobar
2  yes      fo
3   no  foobar

Frequently Asked Questions